OTA-1965: WIP: pkg/cvo: Add Lightspeed proposal integration#1379
OTA-1965: WIP: pkg/cvo: Add Lightspeed proposal integration#1379harche wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: harche The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughCVO now conditionally creates Lightspeed Changes
Sequence DiagramsequenceDiagram
participant CVO as Cluster Version Operator
participant Readiness as Readiness System
participant Creator as Proposal Creator
participant K8s as Kubernetes API
CVO->>K8s: Discover available updates
CVO->>CVO: Check LightspeedProposals gate & non-HyperShift
alt gate enabled & not HyperShift
CVO->>Readiness: RunAll(current, target)
par Concurrent readiness checks
Readiness->>K8s: List/Get cluster resources (Nodes, PDBs, CRDs, CSVs, Subscriptions, ClusterOperator, ClusterVersion, Network, APIRequestCount, etc.)
end
Readiness-->>CVO: Aggregated readiness JSON
CVO->>Creator: MaybeCreateProposal(current, target, readiness JSON)
Creator->>K8s: Create Proposal CR (`proposals.agentic.openshift.io`)
K8s-->>Creator: Created / AlreadyExists / Error
Creator-->>CVO: Creation result
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 20
🧹 Nitpick comments (6)
pkg/readiness/check_test.go (1)
150-224: This test doesn't actually exerciseRunAll.The body manually iterates
checksand recomputesCheckResultvalues — it's a copy of the production logic, so ifRunAll's orchestration (e.g. per-check timeout, concurrency, elapsed accounting, partial-data handling at lines 174-178) regresses, this test won't catch it. TheorigAllChecks := AllChecks+_ = origAllChecksdance on Lines 155-160 makes this explicit: the test acknowledges it can't substitute checks and then proceeds anyway.Consider one of:
- Refactor
pkg/readiness/check.gosoRunAllaccepts[]Check(orAllChecksis a package-level var that tests can override), then call the realRunAllwith[okCheck, failCheck, partialCheck]and assert on the returned*Output(includingMeta.ChecksOK/ChecksErroredand per-checkData/Errorpreservation, which is the behavior this file claims to cover).- Or delete this test and rely on
TestAllChecksReturnsExpectedCount+checks_test.gofor coverage.As per coding guidelines (
**/*_test.go): "Don't introduce single-use variables just to name an intermediate value" —origAllChecksis a flag for the dead-code nature of the test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/check_test.go` around lines 150 - 224, The test TestRunAllMixedResults is not exercising the real RunAll orchestration (it re-implements the logic and only toys with AllChecks), so either make RunAll testable by accepting an injected []Check or make AllChecks a package-level variable tests can override and then call RunAll with []Check{okCheck, failCheck, partialCheck} and assert on the returned *Output (including Meta.ChecksOK/Meta.ChecksErrored and per-check Data/Error preservation), or remove the dead test; update the test to call RunAll (or replace it) instead of manually iterating checks so timeouts, concurrency, elapsed accounting and partial-data handling in RunAll are actually verified.pkg/cvo/cvo.go (1)
400-406: Consider storing the dynamic client onOperatorrather than routing it throughproposalCreator.DynamicClient().Line 1238 retrieves the dynamic client via
optr.proposalCreator.DynamicClient()purely soreadiness.RunAllcan use it. That forcesproposal.Creatorto leak its internal client as a public accessor for an unrelated caller. Initializing and keepingdynamicClientonOperator(and passing it intoNewCreator) keeps the coupling one-way.if optr.shouldCreateLightspeedProposals() { - if dynamicClient, err := dynamic.NewForConfig(restConfig); err != nil { + dynamicClient, err := dynamic.NewForConfig(restConfig) + if err != nil { klog.Warningf("Failed to create dynamic client for LightspeedProposal: %v", err) } else { + optr.dynamicClient = dynamicClient optr.proposalCreator = proposal.NewCreator(dynamicClient, proposal.DefaultConfig()) } }Then
maybeCreateLightspeedProposalusesoptr.dynamicClientdirectly andproposal.Creatorno longer needs aDynamicClient()getter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cvo/cvo.go` around lines 400 - 406, Store the dynamic client on the Operator (optr) instead of relying on proposal.Creator.DynamicClient(): when creating the proposal creator in the init block where proposal.NewCreator(dynamicClient, ...) is called, assign dynamicClient to a new optr.dynamicClient field and pass that same client into proposal.NewCreator; update maybeCreateLightspeedProposal and readiness.RunAll callers to use optr.dynamicClient directly and remove the DynamicClient() accessor from proposal.Creator so the proposal package no longer exposes its internal client.test/cvo/readiness.go (2)
20-74: Add Ginkgo Labels and runRunAllonce per suite, not perIt.Two concerns:
Missing Labels. As per coding guidelines (
test/**/*.go): "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)". Since Lightspeed proposals are gated behindCustomNoUpgrade/LightspeedProposals, this suite should carry at least aTechPreviewlabel (and likelySerial— it reads cluster-wide state that other parallel tests can perturb). Without a label, the test will run on every CI job including default-featureset runs where the prerequisites aren't guaranteed.
readiness.RunAll(...)is invoked in everyItblock (lines 57, 65, 92, 121, 150, 165, 186, 203, 213) — that's 9 live-cluster executions × 9 checks each = 81 round-trips against the apiserver per suite run, with non-negligible latency. Worse, the "ground truth" List calls happen at a different time than the readiness run, so node/pod counts can legitimately drift between the two snapshots and cause flakes.Cache a single
*readiness.Outputin a shared variable (either inBeforeEach, or once inBeforeAllif the suite tolerates ordering), and have eachItonly consume it.♻️ Proposed shape
-var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() { +var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, + g.Label("TechPreview", "Serial"), func() { var ( dynamicClient dynamic.Interface kubeClient kubernetes.Interface configClient *configv1client.ConfigV1Client ctx = context.TODO() currentVersion string targetVersion string + output *readiness.Output ) g.BeforeEach(func() { ... + output = readiness.RunAll(ctx, dynamicClient, currentVersion, targetVersion) })Then drop the repeated
readiness.RunAll(...)calls inside eachItand referenceoutputdirectly. Also, per guidelines, remember to runmake updateso.openshift-tests-extensionmetadata picks up the new test names/labels.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cvo/readiness.go` around lines 20 - 74, Add Ginkgo labels to the suite (e.g., "TechPreview" and "Serial" or the appropriate "CustomNoUpgrade"/"LightspeedProposals" labels) and stop calling readiness.RunAll(...) inside every It; instead, run readiness.RunAll(ctx, dynamicClient, currentVersion, targetVersion) once and store its result in a suite-scoped variable (e.g., var output *readiness.Output) populated in BeforeEach or a BeforeAll so each It reads from that cached output; update all It blocks to reference the cached output and remove the repeated RunAll calls, then run make update to refresh test metadata.
192-194: Unsafe type assertion on JSON-adjacent data.
result.Data["blocking_pdbs"].([]map[string]any)will panic (not fail gracefully) if the check returned nil, returned a different slice shape, or errored. On a healthy cluster with zero PDBs, the check may return an empty[]map[string]anyor an untyped nil; either way, a panicked test provides less signal than a failed assertion.- blockingPDBs := result.Data["blocking_pdbs"].([]map[string]any) - o.Expect(len(blockingPDBs)).To(o.Equal(expectedBlocking), - "blocking PDB count should match actual blocking PDBs") + blockingPDBs, ok := result.Data["blocking_pdbs"].([]map[string]any) + o.Expect(ok).To(o.BeTrue(), "blocking_pdbs should be []map[string]any, got %T", result.Data["blocking_pdbs"]) + o.Expect(len(blockingPDBs)).To(o.Equal(expectedBlocking), + "blocking PDB count should match actual blocking PDBs")Same pattern applies to
result.Data["summary"].(map[string]any)on Line 125.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cvo/readiness.go` around lines 192 - 194, The code does an unsafe type assertion on result.Data["blocking_pdbs"] (and similarly result.Data["summary"]) which can panic; change to a safe assertion using the comma-ok pattern (e.g., v, ok := result.Data["blocking_pdbs"]; cast, ok2 := v.([]map[string]any)) or a type switch, assert with o.Expect that the value exists/is the expected type, and if nil treat it as an empty slice/map before using len or indexing; update the references to blockingPDBs and the summary extraction to use these safe checks so tests fail with assertions instead of panicking.pkg/readiness/client_test.go (1)
102-126: Empty-map subtest asserts nothing; simplify substring search.With
contains: []string{}the inner loop never runs, so the "empty" case trivially passes regardless of whatFormatLabelSelectorreturns. Either assert the expected output explicitly (e.g.got == ""), or drop the case. Also, the manual index walk at Lines 114-119 is juststrings.Contains(got, s).♻️ Proposed simplification
- { - name: "empty", - labels: map[string]string{}, - contains: []string{}, - }, + { + name: "empty", + labels: map[string]string{}, + want: "", + }, ... - got := FormatLabelSelector(tt.labels) - for _, s := range tt.contains { - found := false - for i := 0; i <= len(got)-len(s); i++ { - if got[i:i+len(s)] == s { - found = true - break - } - } - if !found { - t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s) - } - } + got := FormatLabelSelector(tt.labels) + for _, s := range tt.contains { + if !strings.Contains(got, s) { + t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/client_test.go` around lines 102 - 126, The "empty" subtest currently asserts nothing because contains: []string{} makes the inner loop skip; update the test to use strings.Contains(got, s) instead of the manual index walk and for the "empty" case assert the exact expected output (e.g. for the test case with name "empty" and labels map[string]string{} add an explicit check like if got != "" { t.Errorf(...)}), or drop that case—locate the table-driven tests around FormatLabelSelector in pkg/readiness/client_test.go and modify the test loop to call strings.Contains and add the explicit equality assertion for the "empty" entry.pkg/readiness/api_deprecations.go (1)
21-30: Use typed Kubernetes error checks for missing resources.
strings.Contains(err.Error(), "not found")is fragile and can miss error text variations or localization. Preferapierrors.IsNotFound(err)to properly check for resource-not-found errors returned by the Kubernetes client.♻️ Proposed refactor
import ( "context" "fmt" - "strings" + apierrors "k8s.io/apimachinery/pkg/api/errors" "k8s.io/client-go/dynamic" ) @@ arcs, err := ListResources(ctx, dc, GVRAPIRequestCount, "") if err != nil { // APIRequestCount may not be available on all clusters - if strings.Contains(err.Error(), "not found") { + if apierrors.IsNotFound(err) { result["warning"] = "APIRequestCount resource not available"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/api_deprecations.go` around lines 21 - 30, Replace the fragile string check in the error handling block inside the function that lists APIRequestCounts (the code checking if strings.Contains(err.Error(), "not found")) with the typed Kubernetes error check apierrors.IsNotFound(err); import "k8s.io/apimachinery/pkg/api/errors" (alias apierrors if not already) and keep the existing behavior of returning the warning map and nil error when the resource is absent, otherwise return the wrapped fmt.Errorf as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 213-214: Replace the raw cross-build command for the
cluster-version-operator with the supported repository build target: remove the
CGO_ENABLED/GOOS/GOARCH go build line and instead instruct users to run the
repository's make build target (e.g., "make build") which produces the CVO
binary into the _output/ directory; if a raw cross-build must be kept, add an
explicit justification explaining why the dev flow cannot use "make build" and
document expected outputs and environment differences.
- Around line 237-244: Replace the hard-coded personal image reference
quay.io/harpatil/cvo-lightspeed:${TAG} with a configurable IMAGE
variable/placeholder used consistently across the build, push and oc patch
commands (the docker build, docker push lines and the value field in the oc
patch for deployment cluster-version-operator). Update the docs to reference
IMAGE or ${IMAGE} (and document that contributors should set IMAGE to their own
registry/tag) so the build/push and the patch operation use IMAGE instead of the
personal quay.io path.
In `@go.mod`:
- Line 112: The go.mod replace that points github.com/openshift/api to your
personal fork must be removed; edit go.mod to delete the line "replace
github.com/openshift/api => github.com/harche/api ..." and restore use of the
upstream module, then ensure the code referencing FeatureGateLightspeedProposals
in pkg/featuregates/featuregates.go either imports the upstream
github.com/openshift/api package (if the constant exists upstream) or you
instead vendor/define the specific constant locally or upstream the symbol to
the official repo before merging.
In `@install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml`:
- Around line 1-29: The ota-advisory-prompt ConfigMap (metadata name:
ota-advisory-prompt, namespace: openshift-lightspeed) is missing required
cluster-profile annotations and a kubernetes.io/description; add the standard
include.release.openshift.io/* profile selectors (e.g.
include.release.openshift.io/self-managed-high-availability and other profiles
your repository requires) under metadata.annotations and add
kubernetes.io/description with a short purpose string describing this
ConfigMap’s role as the OTA advisory prompt used by Lightspeed agents; apply the
same annotation/description pattern to the related manifests that reference this
ConfigMap (the resources defined in 51_lightspeed-agents.yaml and
52_lightspeed-workflows.yaml) so the CVO graph selector will correctly install
these resources on profile-scoped clusters.
In `@install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml`:
- Around line 11-20: Replace the mutable, external skill image reference
quay.io/harpatil/ocp-skills:latest with an immutable, release-managed image
(either a specific tag or an image digest) so the manifest is deterministic and
disconnected-cluster friendly; locate the skills entry that contains the image
key (the list under "skills" with image: quay.io/harpatil/ocp-skills:latest) and
update that image value to an approved, immutable release image (e.g., use your
release payload registry or a content-addressable digest) and ensure any
CI/release docs are updated to track the chosen immutable reference.
- Around line 4-7: This manifest for the Agent (metadata.name: ota-advisor) is
missing the required cluster-profile include annotations and a description; add
include.release.openshift.io/<profile-name>: "true" entries for every cluster
profile this feature should be included in (use the same profile keys your repo
uses, e.g., include.release.openshift.io/cluster: "true" or
include.release.openshift.io/optional: "true" as appropriate) alongside the
existing release.openshift.io/feature-gate annotation, and add a
kubernetes.io/description annotation that succinctly explains the Agent’s
purpose (e.g., "OTA Advisor Agent: collects and reports OTA proposal
recommendations for LightspeedProposals feature"). Ensure you update the
annotations block under metadata for the resource named ota-advisor.
In `@install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml`:
- Around line 4-7: The manifest for the Workflow resource named ota-advisory is
missing required install annotations and a description; update the
metadata.annotations to add the appropriate include.release.openshift.io
cluster-profile keys (e.g.,
include.release.openshift.io/self-managed-high-availability: "true" and any
necessary exclusion annotations consistent with
install/0000_00_cluster-version-operator_30_deployment.yaml and
_02_networkpolicy.yaml) and add kubernetes.io/description explaining the
resource's purpose; ensure you keep the existing
release.openshift.io/feature-gate: LightspeedProposals while adding the new
include.release.openshift.io/* entries and the kubernetes.io/description string.
In `@pkg/cvo/availableupdates.go`:
- Around line 186-190: The proposal creation is currently invoked inline
(optr.shouldCreateLightspeedProposals() ->
optr.maybeCreateLightspeedProposal(...)) which can block the available-updates
worker; change this to run asynchronously and bounded: when
shouldCreateLightspeedProposals() is true, spawn a goroutine to call
maybeCreateLightspeedProposal but pass a derived context with a short timeout
(context.WithTimeout) and use a global or package-level semaphore/channel to
bound concurrent proposal creations so you don’t spawn unlimited goroutines;
ensure maybeCreateLightspeedProposal observes the passed context, logs any
timeout/errors, and that the goroutine exits cleanly to avoid leaks so update
discovery remains non-blocking.
In `@pkg/cvo/cvo.go`:
- Around line 1219-1249: maybeCreateLightspeedProposal currently calls
readiness.RunAll with the caller's unbounded ctx which can block the
available-updates sync; modify maybeCreateLightspeedProposal to enforce an
overall deadline by creating a derived context with timeout (or spawn a
fire-and-forget goroutine using a context with timeout) before calling
readiness.RunAll, then call readiness.RunAll(ctxWithTimeout, dc,
optr.release.Version, targetVersion) and pass the resulting readinessJSON to
optr.proposalCreator.MaybeCreateProposal so the proposal creation cannot delay
the reconcile loop.
In `@pkg/proposal/proposal.go`:
- Around line 210-230: The loop that prints "Other recommended versions" should
operate on a precomputed filtered slice (e.g., others := filter updates where
u.Version != target) instead of iterating the raw updates so counting and
remaining calculations are correct; change the logic in the block that
references updates/target to first build others, only print the header when
len(others) > 0, iterate up to 5 entries from others, and compute remaining as
len(others)-count when truncating so the "... and N more" value is accurate
(update references to updates, target, count and remaining accordingly).
- Around line 136-153: The sort comparator for the candidates slice currently
violates strict weak ordering when versions are equal because it returns the
same result for both directions; update the sorting to use sort.SliceStable on
candidates and change the Less function to: first compare
candidates[i].version.Compare(candidates[j].version) as before; if equal, check
if candidates[i].kind != candidates[j].kind and prefer updateKindRecommended
when kinds differ; if both version and kind are equal, use candidates[i].raw <
candidates[j].raw as a deterministic tiebreaker; after this stable sort you can
simply return candidates[0].raw and candidates[0].kind as the best candidate.
- Around line 248-257: The sanitize function currently leaves characters like
'+', '/', '_' and other non-alphanumeric/non-dash characters untouched,
producing invalid DNS-1035 labels; update sanitize(s string) to first normalize
to lowercase, then replace any character that is not [a-z0-9-] (e.g., using a
regexp or rune check) with '-', keep the existing replacements for '.' and ' '
(or rely on the general rule), preserve the 20-char truncation behavior and the
TrimRight(s, "-") call so trailing dashes are removed, and preserve the current
behavior allowing digit-start labels (do not force an alphabetic prefix).
In `@pkg/readiness/client.go`:
- Around line 3-7: The helper that constructs the label selector string
currently iterates a map and emits unsorted parts (e.g., "a=1,b=2" vs
"b=2,a=1"); to make output deterministic, collect the map entries into a slice
of "key=value" pairs, sort that slice using key as primary and value as
tiebreaker (i.e., sort by key then by value), then join with commas to produce
the selector; apply the same stable-sorting fix to the other occurrence that
builds label selector parts so both places produce deterministic, test-stable
output.
- Around line 153-163: CompareVersions currently masks parse errors by returning
0, so change its signature to return (int, error) instead of just int: parse a
and b with semver.ParseTolerant, if either parse fails return 0 and the parse
error, otherwise return va.Compare(vb), nil; update callers (notably the code in
api_deprecations.go that calls CompareVersions) to handle the error path
explicitly (e.g., treat parse errors as invalid input and skip or surface an
error) so parse failures are distinguishable from genuine equality.
In `@pkg/readiness/crd_compat.go`:
- Around line 15-67: CRDCompatCheck.Run currently ignores its current/target
params and never uses sectionErrors; either make it target-aware or explicitly
document/rename and remove unused vars. Option A (preferred): when target !=
current, consult a removed-version lookup (e.g.,
RemovedCRDVersionsForTarget(target) or a map you add) and flag CRDs whose
storedVersions appear in that removed set (update versionIssues accordingly),
and remove the dead sectionErrors or populate it with real errors; reference
CRDCompatCheck.Run, current, target, versionIssues, sectionErrors. Option B: if
you only want cluster-state hygiene, add a doc comment to CRDCompatCheck.Run
and/or rename the check to crd_stored_versions, remove current/target parameters
and the unused sectionErrors variable so the signature and function behavior
match.
In `@pkg/readiness/etcd_health.go`:
- Around line 33-50: The code counts pods as healthy based on phase=="Running",
which overstates etcd health; change the readiness check to inspect the pod
"Ready" condition instead (e.g., look through pod.Object.status.conditions for a
condition with type "Ready" and status "True" and use that to set
ready/healthyMembers) while still populating result["members"],
result["healthy_members"], and result["total_members"] as before; also update
TestEtcdHealthCheck in pkg/readiness/checks_test.go to add Ready pod conditions
to the fixture pods so the test reflects the new readiness logic.
In `@pkg/readiness/node_capacity.go`:
- Line 10: The comment/docstring for NodeCapacityCheck is inaccurate: it says it
"assesses node readiness and resource headroom" but the code only reports
readiness and schedulability counts. Update the NodeCapacityCheck comment, any
kubernetes.io/description annotations and nearby doc strings to state that the
check reports node readiness and schedulability counts (e.g., "reports counts of
Ready and Schedulable nodes" or similar), removing the incorrect reference to
resource headroom so the description matches the implementation.
In `@pkg/readiness/olm_lifecycle.go`:
- Around line 149-167: The target compatibility logic in the loop (fields
compat, parsedTarget, parsedMax, parsedMin, entry["compatible_with_target"],
incompatibleWithTarget) only checks maxOCP; update it so when hasTarget is true
you evaluate both min and max bounds: parse minOCP (if non-empty) and set a
minOk flag based on parsedTarget.Compare(parsedMin) >= 0 (or true if no min),
parse maxOCP and set maxOk based on parsedTarget.Compare(parsedMax) <= 0 (or
true if no max), then set entry["compatible_with_target"] = minOk && maxOk and
increment incompatibleWithTarget only when that result is false. Ensure error
handling for semver.ParseTolerant leaves the corresponding bound as
not-applicable (treat as true) rather than skipping the combined check.
- Around line 64-65: The PackageManifest indexing currently uses indexByName
(pkgIndex := indexByName(pkgManifests)), which only keys by metadata.name and
can collide across catalogs; change the index to use a composite key consisting
of status.catalogSource + "|" + status.catalogSourceNamespace + "|" +
(status.packageName or metadata.name as a fallback) (either by replacing
indexByName with a new indexByPackageCompositeKey or extending it), and update
the Subscription lookup that builds the pkg key (the code that currently looks
up pkgIndex[...]) to construct the same composite from subscription.Spec.Source
+ "|" + subscription.Spec.SourceNamespace + "|" + (pkgStatusPackageName or
metadata.name fallback). Also update any other places using pkgIndex (the other
lookups noted in the comment) to use the composite-key logic so
channel/compatibility matching uses the correct CatalogSource.
In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code reads PDB thresholds using NestedString which
drops integer values; in pkg/readiness/pdb_drain.go replace the NestedString
calls for maxUnavailable and minAvailable with
unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable") and
unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable") to capture
the raw IntOrString values from pdb.Object, then convert those raw values to
strings only when building the output/issue map (or keep the raw types if
preferable) so numeric thresholds like 1 or 0 are preserved instead of becoming
empty strings.
---
Nitpick comments:
In `@pkg/cvo/cvo.go`:
- Around line 400-406: Store the dynamic client on the Operator (optr) instead
of relying on proposal.Creator.DynamicClient(): when creating the proposal
creator in the init block where proposal.NewCreator(dynamicClient, ...) is
called, assign dynamicClient to a new optr.dynamicClient field and pass that
same client into proposal.NewCreator; update maybeCreateLightspeedProposal and
readiness.RunAll callers to use optr.dynamicClient directly and remove the
DynamicClient() accessor from proposal.Creator so the proposal package no longer
exposes its internal client.
In `@pkg/readiness/api_deprecations.go`:
- Around line 21-30: Replace the fragile string check in the error handling
block inside the function that lists APIRequestCounts (the code checking if
strings.Contains(err.Error(), "not found")) with the typed Kubernetes error
check apierrors.IsNotFound(err); import "k8s.io/apimachinery/pkg/api/errors"
(alias apierrors if not already) and keep the existing behavior of returning the
warning map and nil error when the resource is absent, otherwise return the
wrapped fmt.Errorf as before.
In `@pkg/readiness/check_test.go`:
- Around line 150-224: The test TestRunAllMixedResults is not exercising the
real RunAll orchestration (it re-implements the logic and only toys with
AllChecks), so either make RunAll testable by accepting an injected []Check or
make AllChecks a package-level variable tests can override and then call RunAll
with []Check{okCheck, failCheck, partialCheck} and assert on the returned
*Output (including Meta.ChecksOK/Meta.ChecksErrored and per-check Data/Error
preservation), or remove the dead test; update the test to call RunAll (or
replace it) instead of manually iterating checks so timeouts, concurrency,
elapsed accounting and partial-data handling in RunAll are actually verified.
In `@pkg/readiness/client_test.go`:
- Around line 102-126: The "empty" subtest currently asserts nothing because
contains: []string{} makes the inner loop skip; update the test to use
strings.Contains(got, s) instead of the manual index walk and for the "empty"
case assert the exact expected output (e.g. for the test case with name "empty"
and labels map[string]string{} add an explicit check like if got != "" {
t.Errorf(...)}), or drop that case—locate the table-driven tests around
FormatLabelSelector in pkg/readiness/client_test.go and modify the test loop to
call strings.Contains and add the explicit equality assertion for the "empty"
entry.
In `@test/cvo/readiness.go`:
- Around line 20-74: Add Ginkgo labels to the suite (e.g., "TechPreview" and
"Serial" or the appropriate "CustomNoUpgrade"/"LightspeedProposals" labels) and
stop calling readiness.RunAll(...) inside every It; instead, run
readiness.RunAll(ctx, dynamicClient, currentVersion, targetVersion) once and
store its result in a suite-scoped variable (e.g., var output *readiness.Output)
populated in BeforeEach or a BeforeAll so each It reads from that cached output;
update all It blocks to reference the cached output and remove the repeated
RunAll calls, then run make update to refresh test metadata.
- Around line 192-194: The code does an unsafe type assertion on
result.Data["blocking_pdbs"] (and similarly result.Data["summary"]) which can
panic; change to a safe assertion using the comma-ok pattern (e.g., v, ok :=
result.Data["blocking_pdbs"]; cast, ok2 := v.([]map[string]any)) or a type
switch, assert with o.Expect that the value exists/is the expected type, and if
nil treat it as an empty slice/map before using len or indexing; update the
references to blockingPDBs and the summary extraction to use these safe checks
so tests fail with assertions instead of panicking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 8966cf38-fdd4-4bf1-8ea3-9fcf63361158
⛔ Files ignored due to path filters (3)
go.sumis excluded by!**/*.sum,!go.sumvendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (27)
AGENTS.mdgo.modinstall/0000_00_cluster-version-operator_50_lightspeed-prompts.yamlinstall/0000_00_cluster-version-operator_51_lightspeed-agents.yamlinstall/0000_00_cluster-version-operator_52_lightspeed-workflows.yamlpkg/cvo/availableupdates.gopkg/cvo/cvo.gopkg/cvo/status_test.gopkg/featuregates/featuregates.gopkg/proposal/proposal.gopkg/proposal/proposal_test.gopkg/readiness/api_deprecations.gopkg/readiness/check.gopkg/readiness/check_test.gopkg/readiness/checks_test.gopkg/readiness/client.gopkg/readiness/client_test.gopkg/readiness/cluster_conditions.gopkg/readiness/crd_compat.gopkg/readiness/etcd_health.gopkg/readiness/network.gopkg/readiness/node_capacity.gopkg/readiness/olm_lifecycle.gopkg/readiness/olm_lifecycle_test.gopkg/readiness/operator_health.gopkg/readiness/pdb_drain.gotest/cvo/readiness.go
| CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \ | ||
| -o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/ |
There was a problem hiding this comment.
Use the repository build target in the new workflow.
This section bypasses the supported CVO build path. Prefer make build, or explicitly document why this dev flow needs a raw cross-build.
Suggested doc adjustment
-CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \
- -o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/
+make buildBased on learnings, “Use make build to build the CVO binary, outputs to _output/ directory”.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor \ | |
| -o _output/linux/amd64/cluster-version-operator ./cmd/cluster-version-operator/ | |
| make build |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` around lines 213 - 214, Replace the raw cross-build command for
the cluster-version-operator with the supported repository build target: remove
the CGO_ENABLED/GOOS/GOARCH go build line and instead instruct users to run the
repository's make build target (e.g., "make build") which produces the CVO
binary into the _output/ directory; if a raw cross-build must be kept, add an
explicit justification explaining why the dev flow cannot use "make build" and
document expected outputs and environment differences.
| docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} . | ||
| docker push quay.io/harpatil/cvo-lightspeed:${TAG} | ||
|
|
||
| # 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts | ||
| RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}') | ||
| API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1) | ||
| oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[ | ||
| {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"}, |
There was a problem hiding this comment.
Avoid hard-coding a personal image repository in shared docs.
Use an IMAGE variable or placeholder so contributors do not copy a registry they cannot push to.
Suggested doc adjustment
TAG="v$(date +%s)"
-docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} .
-docker push quay.io/harpatil/cvo-lightspeed:${TAG}
+IMAGE="quay.io/yourname/cvo-lightspeed:${TAG}"
+docker build --platform linux/amd64 -f Dockerfile.dev -t "${IMAGE}" .
+docker push "${IMAGE}"
@@
- {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"},
+ {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"${IMAGE}\"},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| docker build --platform linux/amd64 -f Dockerfile.dev -t quay.io/harpatil/cvo-lightspeed:${TAG} . | |
| docker push quay.io/harpatil/cvo-lightspeed:${TAG} | |
| # 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts | |
| RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}') | |
| API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1) | |
| oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[ | |
| {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"quay.io/harpatil/cvo-lightspeed:${TAG}\"}, | |
| IMAGE="quay.io/yourname/cvo-lightspeed:${TAG}" | |
| docker build --platform linux/amd64 -f Dockerfile.dev -t "${IMAGE}" . | |
| docker push "${IMAGE}" | |
| # 4. Deploy: single atomic patch (image + args + env) to avoid partial reverts | |
| RELEASE_IMAGE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}') | |
| API_HOST=$(oc get infrastructure cluster -o jsonpath='{.status.apiServerInternalURI}' | sed 's|https://||' | cut -d: -f1) | |
| oc patch deployment cluster-version-operator -n openshift-cluster-version --type json -p "[ | |
| {\"op\":\"replace\",\"path\":\"/spec/template/spec/containers/0/image\",\"value\":\"${IMAGE}\"}, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` around lines 237 - 244, Replace the hard-coded personal image
reference quay.io/harpatil/cvo-lightspeed:${TAG} with a configurable IMAGE
variable/placeholder used consistently across the build, push and oc patch
commands (the docker build, docker push lines and the value field in the oc
patch for deployment cluster-version-operator). Update the docs to reference
IMAGE or ${IMAGE} (and document that contributors should set IMAGE to their own
registry/tag) so the build/push and the patch operation use IMAGE instead of the
personal quay.io path.
|
|
||
| replace github.com/onsi/ginkgo/v2 => github.com/openshift/onsi-ginkgo/v2 v2.6.1-0.20241205171354-8006f302fd12 | ||
|
|
||
| replace github.com/openshift/api => github.com/harche/api v0.0.0-20260414192630-b7a8e3d157cb |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Show all OpenShift API imports and direct Lightspeed feature-gate usages affected by the replace.
rg -n -C2 'github\.com/openshift/api|FeatureGateLightspeedProposals|agentic\.openshift\.io'Repository: openshift/cluster-version-operator
Length of output: 50391
🏁 Script executed:
# Find where FeatureGateLightspeedProposals is actually used in non-vendor code
rg -n 'FeatureGateLightspeedProposals' --type=go | grep -v '^vendor/' | head -20Repository: openshift/cluster-version-operator
Length of output: 249
🏁 Script executed:
# Search for proposal-related unstructured CR creation patterns
rg -n 'Proposal|unstructured|agentic' --type=go pkg/ | head -30Repository: openshift/cluster-version-operator
Length of output: 3414
🏁 Script executed:
# Check the featuregates.go file to understand dependencies
fd -n 'featuregates.go' -x cat -n {}Repository: openshift/cluster-version-operator
Length of output: 308
🏁 Script executed:
# View featuregates.go around the usage lines
sed -n '120,145p' pkg/featuregates/featuregates.goRepository: openshift/cluster-version-operator
Length of output: 934
🏁 Script executed:
# Search for proposal CR creation patterns or resources
rg -n 'ProposalRequest|proposals\.|agentic\.openshift' pkg/ --type=goRepository: openshift/cluster-version-operator
Length of output: 1448
🏁 Script executed:
# Check the imports in featuregates.go
head -30 pkg/featuregates/featuregates.goRepository: openshift/cluster-version-operator
Length of output: 1734
Avoid merging with a module-wide replace to a personal fork.
The replace directive redirects all github.com/openshift/api imports—including stable types like config/v1—to github.com/harche/api. However, analysis shows that only the FeatureGateLightspeedProposals feature gate constant from the fork is actually used (in pkg/featuregates/featuregates.go lines 126 and 138), while proposal CRs are created as unstructured objects (agentic.openshift.io/v1alpha1/Proposal), not typed structs from the API.
Remove this replace directive before merging, or upstream the feature gate symbol to the main openshift/api repository instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` at line 112, The go.mod replace that points github.com/openshift/api
to your personal fork must be removed; edit go.mod to delete the line "replace
github.com/openshift/api => github.com/harche/api ..." and restore use of the
upstream module, then ensure the code referencing FeatureGateLightspeedProposals
in pkg/featuregates/featuregates.go either imports the upstream
github.com/openshift/api package (if the constant exists upstream) or you
instead vendor/define the specific constant locally or upstream the symbol to
the official repo before merging.
| --- | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: ota-advisory-prompt | ||
| namespace: openshift-lightspeed | ||
| annotations: | ||
| release.openshift.io/feature-gate: LightspeedProposals | ||
| data: | ||
| prompt: | | ||
| You are an OpenShift upgrade advisor. Analyze the cluster readiness | ||
| data in the proposal request and produce an upgrade risk assessment. | ||
|
|
||
| The request contains a "Cluster Readiness Data" section with a JSON | ||
| block. This was collected by the Cluster Version Operator — do not | ||
| re-collect it. Parse the JSON, evaluate each check's results, and | ||
| classify findings as blockers, warnings, or informational. | ||
|
|
||
| Use the ota-upgrade-advisor skill for the decision framework and | ||
| blocker classification rules. When findings need deeper investigation, | ||
| use prometheus, platform-docs, redhat-support, or product-lifecycle | ||
| skills. | ||
|
|
||
| When the readiness data includes olm_operator_lifecycle results, use | ||
| the product-lifecycle skill to cross-reference each operator's package | ||
| name against the Red Hat Product Life Cycle API. Report support phase, | ||
| EOL dates, and OCP compatibility from PLCC alongside the OLM data. | ||
|
|
||
| Do not guess or assume cluster state. Do not execute upgrade commands. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the sibling lightspeed manifests carry the expected profile annotations so this file isn't the only outlier.
fd -t f 'lightspeed' install
fd -t f 'lightspeed' install --exec rg -nH 'include.release.openshift.io|kubernetes.io/description|release.openshift.io/feature-gate' {}Repository: openshift/cluster-version-operator
Length of output: 670
🏁 Script executed:
for file in install/0000_00_cluster-version-operator_5{0,1,2}_lightspeed-*.yaml; do
if [ -f "$file" ]; then
echo "=== $file ==="
cat "$file"
echo ""
fi
doneRepository: openshift/cluster-version-operator
Length of output: 9278
Missing cluster-profile annotations and kubernetes.io/description.
All three related lightspeed manifest files (50_lightspeed-prompts.yaml, 51_lightspeed-agents.yaml, and 52_lightspeed-workflows.yaml) lack required annotations per coding guidelines for install/**:
- "All manifests must have appropriate cluster-profile annotations (include.release.openshift.io/self-managed-high-availability, etc.)"
- "Use kubernetes.io/description annotations to explain the resource's purpose"
Without cluster-profile annotations, the CVO manifest graph selector won't apply this ConfigMap on profile-scoped clusters, breaking the Agent manifest reference to ota-advisory-prompt at runtime.
🛡️ Suggested metadata block
metadata:
name: ota-advisory-prompt
namespace: openshift-lightspeed
annotations:
release.openshift.io/feature-gate: LightspeedProposals
+ include.release.openshift.io/self-managed-high-availability: "true"
+ include.release.openshift.io/single-node-developer: "true"
+ kubernetes.io/description: |
+ System prompt used by the Lightspeed OTA advisory agent to analyze
+ Cluster Version Operator readiness data and classify upgrade findings
+ as blockers, warnings, or informational. Referenced by the Lightspeed
+ Agent via spec.systemPromptRef.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml` around
lines 1 - 29, The ota-advisory-prompt ConfigMap (metadata name:
ota-advisory-prompt, namespace: openshift-lightspeed) is missing required
cluster-profile annotations and a kubernetes.io/description; add the standard
include.release.openshift.io/* profile selectors (e.g.
include.release.openshift.io/self-managed-high-availability and other profiles
your repository requires) under metadata.annotations and add
kubernetes.io/description with a short purpose string describing this
ConfigMap’s role as the OTA advisory prompt used by Lightspeed agents; apply the
same annotation/description pattern to the related manifests that reference this
ConfigMap (the resources defined in 51_lightspeed-agents.yaml and
52_lightspeed-workflows.yaml) so the CVO graph selector will correctly install
these resources on profile-scoped clusters.
| metadata: | ||
| name: ota-advisor | ||
| annotations: | ||
| release.openshift.io/feature-gate: LightspeedProposals |
There was a problem hiding this comment.
Add the required release-profile and description annotations.
This manifest is feature-gated, but it still needs the appropriate include.release.openshift.io/* cluster-profile annotations and a kubernetes.io/description explaining the Agent’s purpose.
As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_51_lightspeed-agents.yaml` around
lines 4 - 7, This manifest for the Agent (metadata.name: ota-advisor) is missing
the required cluster-profile include annotations and a description; add
include.release.openshift.io/<profile-name>: "true" entries for every cluster
profile this feature should be included in (use the same profile keys your repo
uses, e.g., include.release.openshift.io/cluster: "true" or
include.release.openshift.io/optional: "true" as appropriate) alongside the
existing release.openshift.io/feature-gate annotation, and add a
kubernetes.io/description annotation that succinctly explains the Agent’s
purpose (e.g., "OTA Advisor Agent: collects and reports OTA proposal
recommendations for LightspeedProposals feature"). Ensure you update the
annotations block under metadata for the resource named ota-advisor.
| podStatuses := make([]map[string]any, 0, len(etcdPods)) | ||
| healthyMembers := 0 | ||
| for _, pod := range etcdPods { | ||
| phase := NestedString(pod.Object, "status", "phase") | ||
| ready := phase == "Running" | ||
| if ready { | ||
| healthyMembers++ | ||
| } | ||
| podStatuses = append(podStatuses, map[string]any{ | ||
| "name": pod.GetName(), | ||
| "node": NestedString(pod.Object, "spec", "nodeName"), | ||
| "phase": phase, | ||
| "ready": ready, | ||
| }) | ||
| } | ||
| result["members"] = podStatuses | ||
| result["healthy_members"] = healthyMembers | ||
| result["total_members"] = len(etcdPods) |
There was a problem hiding this comment.
phase == "Running" is not the same as "healthy etcd member".
A pod in Running phase can still have unready containers (startup/readiness probe failing, member catching up, leader election, fsync stalls). Counting such a pod as a healthy etcd member over-reports healthy_members and may cause the Lightspeed advisor to classify an unsafe upgrade as safe.
Use the Ready pod condition instead (or in addition):
- for _, pod := range etcdPods {
- phase := NestedString(pod.Object, "status", "phase")
- ready := phase == "Running"
- if ready {
- healthyMembers++
- }
- podStatuses = append(podStatuses, map[string]any{
- "name": pod.GetName(),
- "node": NestedString(pod.Object, "spec", "nodeName"),
- "phase": phase,
- "ready": ready,
- })
- }
+ for _, pod := range etcdPods {
+ phase := NestedString(pod.Object, "status", "phase")
+ ready := false
+ for _, c := range NestedSlice(pod.Object, "status", "conditions") {
+ cm, ok := c.(map[string]interface{})
+ if !ok {
+ continue
+ }
+ if NestedString(cm, "type") == "Ready" && NestedString(cm, "status") == string(ConditionTrue) {
+ ready = true
+ break
+ }
+ }
+ if ready {
+ healthyMembers++
+ }
+ podStatuses = append(podStatuses, map[string]any{
+ "name": pod.GetName(),
+ "node": NestedString(pod.Object, "spec", "nodeName"),
+ "phase": phase,
+ "ready": ready,
+ })
+ }Note: this will require updating TestEtcdHealthCheck in pkg/readiness/checks_test.go to add Ready pod conditions to the fixture pods (currently only status.phase is set, so healthy_members would drop to 0).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| podStatuses := make([]map[string]any, 0, len(etcdPods)) | |
| healthyMembers := 0 | |
| for _, pod := range etcdPods { | |
| phase := NestedString(pod.Object, "status", "phase") | |
| ready := phase == "Running" | |
| if ready { | |
| healthyMembers++ | |
| } | |
| podStatuses = append(podStatuses, map[string]any{ | |
| "name": pod.GetName(), | |
| "node": NestedString(pod.Object, "spec", "nodeName"), | |
| "phase": phase, | |
| "ready": ready, | |
| }) | |
| } | |
| result["members"] = podStatuses | |
| result["healthy_members"] = healthyMembers | |
| result["total_members"] = len(etcdPods) | |
| podStatuses := make([]map[string]any, 0, len(etcdPods)) | |
| healthyMembers := 0 | |
| for _, pod := range etcdPods { | |
| phase := NestedString(pod.Object, "status", "phase") | |
| ready := false | |
| for _, c := range NestedSlice(pod.Object, "status", "conditions") { | |
| cm, ok := c.(map[string]interface{}) | |
| if !ok { | |
| continue | |
| } | |
| if NestedString(cm, "type") == "Ready" && NestedString(cm, "status") == string(ConditionTrue) { | |
| ready = true | |
| break | |
| } | |
| } | |
| if ready { | |
| healthyMembers++ | |
| } | |
| podStatuses = append(podStatuses, map[string]any{ | |
| "name": pod.GetName(), | |
| "node": NestedString(pod.Object, "spec", "nodeName"), | |
| "phase": phase, | |
| "ready": ready, | |
| }) | |
| } | |
| result["members"] = podStatuses | |
| result["healthy_members"] = healthyMembers | |
| result["total_members"] = len(etcdPods) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/etcd_health.go` around lines 33 - 50, The code counts pods as
healthy based on phase=="Running", which overstates etcd health; change the
readiness check to inspect the pod "Ready" condition instead (e.g., look through
pod.Object.status.conditions for a condition with type "Ready" and status "True"
and use that to set ready/healthyMembers) while still populating
result["members"], result["healthy_members"], and result["total_members"] as
before; also update TestEtcdHealthCheck in pkg/readiness/checks_test.go to add
Ready pod conditions to the fixture pods so the test reflects the new readiness
logic.
| "k8s.io/client-go/dynamic" | ||
| ) | ||
|
|
||
| // NodeCapacityCheck assesses node readiness and resource headroom. |
There was a problem hiding this comment.
Keep the check description aligned with its output.
This check does not calculate resource headroom; it reports readiness and schedulability counts.
Suggested wording
-// NodeCapacityCheck assesses node readiness and resource headroom.
+// NodeCapacityCheck assesses node readiness and schedulability.As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // NodeCapacityCheck assesses node readiness and resource headroom. | |
| // NodeCapacityCheck assesses node readiness and schedulability. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/node_capacity.go` at line 10, The comment/docstring for
NodeCapacityCheck is inaccurate: it says it "assesses node readiness and
resource headroom" but the code only reports readiness and schedulability
counts. Update the NodeCapacityCheck comment, any kubernetes.io/description
annotations and nearby doc strings to state that the check reports node
readiness and schedulability counts (e.g., "reports counts of Ready and
Schedulable nodes" or similar), removing the incorrect reference to resource
headroom so the description matches the implementation.
| csvIndex := indexByNamespacedName(csvs) | ||
| pkgIndex := indexByName(pkgManifests) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n pkg/readiness/olm_lifecycle.go | head -80Repository: openshift/cluster-version-operator
Length of output: 3095
🏁 Script executed:
# Look at the lines mentioned in the review
sed -n '60,70p' pkg/readiness/olm_lifecycle.go
sed -n '140,150p' pkg/readiness/olm_lifecycle.go
sed -n '200,220p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1533
🏁 Script executed:
# Find the indexByName function definition
rg -n "func indexByName" pkg/readiness/olm_lifecycle.go -A 10Repository: openshift/cluster-version-operator
Length of output: 994
🏁 Script executed:
# Check if indexByName is used elsewhere
rg -n "indexByName" pkg/Repository: openshift/cluster-version-operator
Length of output: 704
🏁 Script executed:
# Look at the Subscription and PackageManifest structures
rg -n "spec.*source" pkg/readiness/olm_lifecycle.go -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 369
🏁 Script executed:
# Check what fields are extracted from PackageManifests
rg -n "NestedString.*pm\|NestedSlice.*pm" pkg/readiness/olm_lifecycle.go -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look at the context where pkgIndex is used (around line 142-145)
sed -n '135,155p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 792
🏁 Script executed:
# Check for OLM PackageManifest API specification or comments
rg -n "PackageManifest\|GVRPackageManifest" pkg/readiness/ -B 2 -A 2 | head -40Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Search for any documentation about PackageManifest fields
fd -e "*.go" -e "*.md" pkg/ | xargs rg -l "catalogSource\|packageName" 2>/dev/null | head -5Repository: openshift/cluster-version-operator
Length of output: 399
🏁 Script executed:
# Look at the actual usage and what's being extracted from PackageManifest
sed -n '130,180p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1616
🏁 Script executed:
# Check if there are any tests that show PackageManifest structure
find . -name "*test*.go" -type f | xargs grep -l "PackageManifest\|olm_lifecycle" 2>/dev/null | head -5Repository: openshift/cluster-version-operator
Length of output: 146
🏁 Script executed:
# Look for any comments or documentation about PackageManifest fields
rg -n "status.*catalogSource\|catalogSourceNamespace\|packageName" pkg/readiness/ -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the API resource definition
rg -n "GVRPackageManifest\|Package.*Manifest" pkg/readiness/ -B 1 -A 1Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Get more context on how Subscriptions reference PackageManifests
sed -n '78,95p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 708
🏁 Script executed:
# Check the test file to understand PackageManifest structure
cat -n pkg/readiness/olm_lifecycle_test.go | head -100Repository: openshift/cluster-version-operator
Length of output: 4153
🏁 Script executed:
# Search for PackageManifest examples in tests
rg -n "PackageManifest\|packagemanifest" pkg/readiness/olm_lifecycle_test.go -A 5 -B 2 | head -80Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check what GVRPackageManifest is
rg -n "GVRPackageManifest\|GVR" pkg/readiness/ -B 1 -A 1 | grep -A 1 -B 1 "PackageManifest"Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look at the resource definitions
fd -name "*.go" pkg/readiness | xargs grep -n "GVR.*=" | head -20Repository: openshift/cluster-version-operator
Length of output: 308
🏁 Script executed:
# Get the resource definitions
grep -rn "GVR.*=" pkg/readiness/ | head -30Repository: openshift/cluster-version-operator
Length of output: 3385
🏁 Script executed:
# Look for more context about how multiple catalogs work
rg -n "source\|catalog" pkg/readiness/olm_lifecycle_test.go -B 2 -A 2 | head -60Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the full test to see if it covers multiple catalog sources
cat -n pkg/readiness/olm_lifecycle_test.go | tail -n +100 | head -100Repository: openshift/cluster-version-operator
Length of output: 4162
🏁 Script executed:
# Check if there are any other test cases with multiple catalogs or sources
cat -n pkg/readiness/olm_lifecycle_test.go | tail -n +140Repository: openshift/cluster-version-operator
Length of output: 11741
🏁 Script executed:
# Search for any documentation or comments about catalog sources
rg -n "catalog\|Catalog" pkg/readiness/ -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check if PackageManifest has status fields mentioned in the proposed fix
rg -n "catalogSource\|packageName" pkg/readiness/Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look for OLM API documentation or references
rg -n "operators.coreos.com\|packages.operators.coreos" pkg/ -B 1 -A 1 | head -40Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Search for more complete PackageManifest examples in tests
cat -n pkg/readiness/olm_lifecycle_test.go | grep -A 20 "PackageManifest"Repository: openshift/cluster-version-operator
Length of output: 1658
🏁 Script executed:
# Check if there are any comments or documentation about PackageManifest status fields
rg -n "status\|channel\|manifest" pkg/readiness/olm_lifecycle.go | grep -i "package\|manifest" | head -20Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look at the extractOCPCompat function to understand PackageManifest structure better
sed -n '213,260p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1460
🏁 Script executed:
# Check if the proposed fields (status.catalogSource) are actually used in the codebase
rg -rn "catalogSource\|packageName" pkg/Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check OLM API documentation online - search for PackageManifest API structure
# First, let's look for any imports or references to OLM packages
rg -n "operators.coreos.com\|packages.operators.coreos" pkg/readiness/ -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look at the entire olm_lifecycle.go file to understand the full context
wc -l pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 112
🏁 Script executed:
# Check the key lines in detail - around 142-145 where pkgIndex is used
sed -n '135,175p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1235
🏁 Script executed:
# Look for any documentation or comments about how PackageManifest maps to Subscription
rg -n "Subscription\|PackageManifest" pkg/readiness/olm_lifecycle.go | head -20Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check if there are any other files that document PackageManifest structure
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs grep -l "PackageManifest\|catalogSource" 2>/dev/null | head -10Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look for any OLM-related documentation or vendor code
find . -path "*/vendor/*" -prune -o -type f -name "*.go" -print | xargs grep -l "packagemanifests\|PackageManifest" 2>/dev/null | head -10Repository: openshift/cluster-version-operator
Length of output: 205
🏁 Script executed:
# Check the complete file to see if there's any comment about the correlation logic
sed -n '1,50p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1642
🏁 Script executed:
# Check if there's any real-world scenario where package names collide
rg -n "unique\|collid\|duplicate\|conflict" pkg/readiness/olm_lifecycle.go -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Look for any issue or bug tracker references
rg -n "TODO\|FIXME\|BUG\|XXX\|HACK" pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Let's check the OLM API specification to verify PackageManifest has catalogSource fields
# First, check if there are any comments or documentation about this
cat -n pkg/readiness/olm_lifecycle.go | sed -n '20,30p'Repository: openshift/cluster-version-operator
Length of output: 844
🏁 Script executed:
# Look at the complete logic to understand what the code is trying to do
sed -n '78,120p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 1618
🏁 Script executed:
# Check if there are integration tests or examples that show real PackageManifest structure
find . -type f -name "*.yaml" -o -name "*.json" | xargs grep -l "PackageManifest" 2>/dev/null | head -5Repository: openshift/cluster-version-operator
Length of output: 60
🌐 Web query:
OLM PackageManifest API spec catalogSource sourceNamespace
💡 Result:
The PackageManifest is a Kubernetes Custom Resource Definition (CRD) in the OLM (Operator Lifecycle Manager) ecosystem, part of the OperatorHub APIs. It provides metadata about an Operator package, including available channels and versions from a specific catalog source. It is typically created and managed by OLM when a CatalogSource is deployed. API Group/Version: packages.operators.coreos.com/v1 Key fields in status (PackageManifestSpec is typically empty as it's status-driven): - catalogSource: string - Name of the CatalogSource this package belongs to (e.g., "certified-operators"). - catalogSourceNamespace: string - Namespace of the owning CatalogSource (typically "olm" or "openshift-marketplace"). - catalogSourcePublisher: string - Publisher of the catalog (e.g., "Red Hat Operators"). - catalogSourceDisplayName: string - Display name of the catalog. - packageName: string - Name of the Operator package (e.g., "etcd"). - defaultChannel: string - Default channel for installation (e.g., "stable"). - channels: array of objects - Each channel has name, currentCSV, entries (listing CSVs/versions with upgrade paths). PackageManifests are usually namespaced (in the CatalogSource's namespace) and queried via kubectl get packagemanifest -n olm. They power the OperatorHub UI and Subscription creation, where Subscriptions reference source (catalogSource name) and sourceNamespace. Note: PackageManifest format is legacy; modern OLM prefers bundle format, but PackageManifest CRs still exist for compatibility. Example usage in Subscription: spec: source: my-catalog sourceNamespace: olm
Citations:
- 1: https://docs.openshift.com/container-platform/4.15/rest_api/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1.html
- 2: https://docs.redhat.com/en/documentation/openshift_container_platform/4.14/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
- 3: https://docs.redhat.com/en/documentation/openshift_container_platform/4.15/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
- 4: https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
- 5: https://docs.okd.io/latest/rest_api/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1.html
🌐 Web query:
operators.coreos.com PackageManifest status fields
💡 Result:
The PackageManifest resource in the packages.operators.coreos.com/v1 API has a status object (PackageManifestStatus) with the following top-level fields: - catalogSource (string, required): Name of the CatalogSource this package belongs to. - catalogSourceDisplayName (string, required): Display name of the CatalogSource. - catalogSourceNamespace (string, required): Namespace of the owning CatalogSource. - catalogSourcePublisher (string, required): Publisher of the CatalogSource. - packageName (string, required): Name of the overall package (e.g., "etcd"). - channels (array of objects, required): Declared channels for the package (e.g., "stable", "alpha"). Each channel object includes: - name (string, required): Channel name. - currentCSV (string, required): Reference to the current ClusterServiceVersion (CSV) for the channel. - currentCSVDesc (object): Description of the current CSV, including displayName, description, version, icon, keywords, maintainers, links, relatedImages, installModes, maturity, minKubeVersion, etc. - entries (array of objects): List of all CSVs in the channel with upgrade paths; each entry has name, version, replaces, skipRange, and optional deprecation. - deprecation (object): Deprecation info for the channel. - defaultChannel (string, required): Default channel name. - provider (object): Provider information with name and URL. - deprecation (object): Top-level deprecation info for the package, including message and time. The spec is typically empty or minimal, as PackageManifests are primarily status-driven by Operator Lifecycle Manager (OLM) from CatalogSources. This schema is consistent across OpenShift/OKD 4.12-4.17 docs.
Citations:
- 1: https://docs.redhat.com/en/documentation/openshift_container_platform/4.15/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
- 2: https://docs.openshift.com/container-platform/4.15/rest_api/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1.html
- 3: https://docs.openshift.com/container-platform/4.12/rest_api/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1.html
- 4: https://docs.okd.io/4.16/rest_api/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1.html
- 5: https://docs.redhat.com/en/documentation/openshift_container_platform/4.12/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
- 6: https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html/operatorhub_apis/packagemanifest-packages-operators-coreos-com-v1
🏁 Script executed:
# Verify that PackageManifests are indeed cluster-scoped (no namespace)
# and would need the status fields for proper correlation
sed -n '49,65p' pkg/readiness/olm_lifecycle.goRepository: openshift/cluster-version-operator
Length of output: 515
🏁 Script executed:
# Check how ListResources works for PackageManifests
rg -n "ListResources\|GVRPackageManifest" pkg/readiness/client.go -A 5 -B 2Repository: openshift/cluster-version-operator
Length of output: 60
Use composite key (source + sourceNamespace + packageName) when indexing PackageManifests.
Package names are not globally unique across catalogs. The current code indexes PackageManifests by metadata.name only, but Subscriptions specify both spec.source and spec.sourceNamespace to identify which catalog they depend on. Without matching on these fields, the code may attach compatibility and channel data from the wrong catalog if the same package name appears in multiple CatalogSources.
Change the PackageManifest index to use a composite key built from status.catalogSource, status.catalogSourceNamespace, and status.packageName (or metadata.name as fallback). Update the lookup at line 143 to construct the same key from the Subscription's source and sourceNamespace.
Also applies to: 142-145, 204-211
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/olm_lifecycle.go` around lines 64 - 65, The PackageManifest
indexing currently uses indexByName (pkgIndex := indexByName(pkgManifests)),
which only keys by metadata.name and can collide across catalogs; change the
index to use a composite key consisting of status.catalogSource + "|" +
status.catalogSourceNamespace + "|" + (status.packageName or metadata.name as a
fallback) (either by replacing indexByName with a new indexByPackageCompositeKey
or extending it), and update the Subscription lookup that builds the pkg key
(the code that currently looks up pkgIndex[...]) to construct the same composite
from subscription.Spec.Source + "|" + subscription.Spec.SourceNamespace + "|" +
(pkgStatusPackageName or metadata.name fallback). Also update any other places
using pkgIndex (the other lookups noted in the comment) to use the composite-key
logic so channel/compatibility matching uses the correct CatalogSource.
| maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable") | ||
| minAvailable := NestedString(pdb.Object, "spec", "minAvailable") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Kubernetes policy/v1 PodDisruptionBudgetSpec minAvailable maxUnavailable IntOrString
💡 Result:
In Kubernetes policy/v1 PodDisruptionBudgetSpec, both minAvailable and maxUnavailable are fields of type IntOrString. They are mutually exclusive; only one can be specified in a single PodDisruptionBudget. IntOrString is a type that can hold either an int32 (absolute number of pods, e.g., 2) or a string (percentage, e.g., "50%"). When serialized to JSON or YAML, it represents the inner type directly. - minAvailable (IntOrString): Specifies the minimum number of pods that must remain available after an eviction. For example, minAvailable: 2 or minAvailable: "50%" (at least 50% available). - maxUnavailable (IntOrString): Specifies the maximum number of pods that can be unavailable after an eviction. For example, maxUnavailable: 1 or maxUnavailable: "20%" (at most 20% unavailable). Available since Kubernetes 1.7. Examples: apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: example-min spec: minAvailable: 2 # or "90%" selector: matchLabels: app: example --- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: example-max spec: maxUnavailable: 1 # or "10%" selector: matchLabels: app: example Setting both fields results in validation error: "minAvailable and maxUnavailable cannot be both set".
Citations:
- 1: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
- 2: https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/pod-disruption-budget-v1
- 3: http://www.k8s.io/docs/reference/kubernetes-api/policy-resources/pod-disruption-budget-v1/
- 4: https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/pod-disruption-budget-v1/
🏁 Script executed:
# First, find and examine the NestedString function definition
rg -n "func NestedString" --type goRepository: openshift/cluster-version-operator
Length of output: 678
🏁 Script executed:
# Then check the actual code at the file and lines mentioned
cat -n pkg/readiness/pdb_drain.go | head -40Repository: openshift/cluster-version-operator
Length of output: 1716
🏁 Script executed:
# Look for imports and context around NestedString usage
rg -n "NestedString" pkg/readiness/pdb_drain.go -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 370
🏁 Script executed:
# Check the NestedString implementation in the codebase (pkg/readiness/client.go)
cat -n pkg/readiness/client.go | sed -n '100,130p'Repository: openshift/cluster-version-operator
Length of output: 1173
🏁 Script executed:
# Also check what other NestedInt64 looks like to understand the pattern
rg -n "func NestedInt64" pkg/readiness/client.go -A 10Repository: openshift/cluster-version-operator
Length of output: 479
🏁 Script executed:
# Check the kubernetes unstructured NestedString implementation
cat -n vendor/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/helpers.go | sed -n '75,95p'Repository: openshift/cluster-version-operator
Length of output: 948
🏁 Script executed:
# Also check NestedFieldNoCopy to understand what the proposed fix does
rg -n "func NestedFieldNoCopy" vendor/k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/helpers.go -A 15Repository: openshift/cluster-version-operator
Length of output: 607
🏁 Script executed:
# Let's verify how IntOrString values appear when unstructured
# Check if there are any tests or examples showing how PDB objects are structured
rg -n "IntOrString\|minAvailable\|maxUnavailable" pkg/readiness/ --type goRepository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the complete context of how maxUnavailable and minAvailable are used in the output
cat -n pkg/readiness/pdb_drain.go | sed -n '24,50p'Repository: openshift/cluster-version-operator
Length of output: 1267
🏁 Script executed:
# Let's understand how the IntOrString values are typically serialized/deserialized
# Check if there are any tests in the repository that show PDB structures
find . -name "*test*.go" -type f | xargs grep -l "PodDisruptionBudget\|pdb" | head -5Repository: openshift/cluster-version-operator
Length of output: 169
🏁 Script executed:
# Also check if there's documentation or examples of how PDB objects are handled
rg -n "Object\[" pkg/readiness/pdb_drain.go -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the test file for PDB-related tests
rg -n "PodDisruptionBudget\|pdb" ./pkg/readiness/check_test.go -B 3 -A 3Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Also search for any YAML examples or test fixtures that show IntOrString values
find . -type f \( -name "*.yaml" -o -name "*.yml" \) | xargs grep -l "minAvailable\|maxUnavailable" 2>/dev/null | head -3Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the imports at the top of pdb_drain.go
head -20 pkg/readiness/pdb_drain.goRepository: openshift/cluster-version-operator
Length of output: 612
🏁 Script executed:
# Check if unstructured is already imported elsewhere in the client.go file
head -30 pkg/readiness/client.goRepository: openshift/cluster-version-operator
Length of output: 2309
🏁 Script executed:
# Let's verify what happens when NestedString encounters an integer value
# by checking if there are type conversions or coercions happening elsewhere
rg -n "fmt.Sprint\|fmt.Sprintf\|strconv" pkg/readiness/pdb_drain.goRepository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check if maxUnavailable and minAvailable are used elsewhere or if this is the only place
rg -n "max_unavailable\|min_available" pkg/readiness/ --type goRepository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Let's also check if the proposed import is already available
rg -n "unstructured" pkg/readiness/pdb_drain.goRepository: openshift/cluster-version-operator
Length of output: 60
Preserve numeric PDB threshold values.
minAvailable and maxUnavailable are Kubernetes IntOrString fields that can be either integers or percentage strings. Using NestedString on an integer value causes a type assertion to fail, returning an empty string instead. PDBs with minAvailable: 1 or maxUnavailable: 0 will show empty threshold fields in the proposal data.
Use unstructured.NestedFieldNoCopy to capture the raw value and preserve both numeric and string formats:
Fix
+ "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"- maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable")
- minAvailable := NestedString(pdb.Object, "spec", "minAvailable")
+ maxUnavailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable")
+ minAvailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable")Then cast to string when outputting if needed, or use the raw values in the issues map.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/pdb_drain.go` around lines 26 - 27, The code reads PDB
thresholds using NestedString which drops integer values; in
pkg/readiness/pdb_drain.go replace the NestedString calls for maxUnavailable and
minAvailable with unstructured.NestedFieldNoCopy(pdb.Object, "spec",
"maxUnavailable") and unstructured.NestedFieldNoCopy(pdb.Object, "spec",
"minAvailable") to capture the raw IntOrString values from pdb.Object, then
convert those raw values to strings only when building the output/issue map (or
keep the raw types if preferable) so numeric thresholds like 1 or 0 are
preserved instead of becoming empty strings.
1a82825 to
38fbcfb
Compare
|
@harche: This pull request references OTA-1965 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
|
||
| // Create a LightspeedProposal if the feature is enabled and updates are available. | ||
| // This is best-effort and must never block update discovery. | ||
| if optr.shouldCreateLightspeedProposals() { |
There was a problem hiding this comment.
I am thinking of making another queue (thus the handling/consumer will be in another goroutine) which makes it look like a controller.
optr.proposalController.Queue().Add(optr.proposalController.QueueKey())The details are:
79d8370#diff-4229ccef40cdb3dd7a8e5ca230d85fa0e74bbc265511ddd94f53acffbcd19b79R186
I did not know you are working on it already. 😄
WDYT?
There was a problem hiding this comment.
Sorry it seems like there is some misunderstanding, if you are already working on it please continue on that path, I can close this PR linking yours.
``` ### fetch openshift#1379 $ git add install/0000_00_cluster-version-operator_4* ```
38fbcfb to
f5cc163
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (3)
pkg/readiness/node_capacity.go (1)
10-10:⚠️ Potential issue | 🟡 MinorKeep the check description aligned with its output.
This check reports node readiness and schedulability counts; it does not calculate resource headroom.
Proposed wording
-// NodeCapacityCheck assesses node readiness and resource headroom. +// NodeCapacityCheck assesses node readiness and schedulability.As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/node_capacity.go` at line 10, The comment for NodeCapacityCheck is inaccurate—update the docstring/comment and any kubernetes.io/description annotations near the NodeCapacityCheck function to reflect that it reports node readiness and schedulability counts (not resource headroom); locate the NodeCapacityCheck declaration in node_capacity.go and change the description text to explicitly say it assesses node readiness and counts schedulable/unschedulable nodes so the wording matches the actual output.pkg/cvo/cvo.go (1)
775-779:⚠️ Potential issue | 🟠 MajorKeep the proposal path from blocking
sync.This path is described as best-effort and non-blocking, but
handleProposalAnnotationsynchronously runs readiness checks downstream. Add an overall deadline at minimum, or enqueue this on a separate worker so proposal generation cannot delay status reconciliation.Minimum bounded-call fix
if optr.shouldCreateLightspeedProposals() { - optr.handleProposalAnnotation(ctx, original) + proposalCtx, cancel := context.WithTimeout(ctx, 2*time.Minute) + defer cancel() + optr.handleProposalAnnotation(proposalCtx, original) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cvo/cvo.go` around lines 775 - 779, The proposal path currently calls optr.handleProposalAnnotation synchronously from the sync path (guarded by optr.shouldCreateLightspeedProposals()) which can block reconciliation; change this to a non-blocking bounded call by creating a short child context with a timeout (e.g., context.WithTimeout) and invoking handleProposalAnnotation in a separate goroutine (or enqueue the request to an existing worker queue) so the main sync returns immediately; ensure you pass the timeout-bound ctx into handleProposalAnnotation and log any timeout/error outcomes instead of letting it delay status reconciliation.pkg/readiness/pdb_drain.go (1)
26-27:⚠️ Potential issue | 🟡 MinorPreserve numeric PDB threshold values.
minAvailableandmaxUnavailablecan be integer or percentage values. Reading them withNestedStringloses integer thresholds, so proposal readiness data can omit the value that is actually blocking drains.Proposed fix
import ( "context" "fmt" + "k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" "k8s.io/client-go/dynamic" ) @@ - maxUnavailable := NestedString(pdb.Object, "spec", "maxUnavailable") - minAvailable := NestedString(pdb.Object, "spec", "minAvailable") + maxUnavailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "maxUnavailable") + minAvailable, _, _ := unstructured.NestedFieldNoCopy(pdb.Object, "spec", "minAvailable")Read-only verification:
#!/bin/bash # Verify current PDB threshold extraction and helper behavior. rg -n -C3 'maxUnavailable|minAvailable|func NestedString' pkg/readinessAlso applies to: 34-42
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/pdb_drain.go` around lines 26 - 27, The code currently calls NestedString(pdb.Object, "spec", "maxUnavailable") and NestedString(... "minAvailable") which coerces numeric PDB thresholds to strings and loses integer values; change the extraction to fetch the raw value (use unstructured.NestedFieldCopy or NestedFieldNoCopy on pdb.Object with "spec","maxUnavailable" and "spec","minAvailable"), then type-switch the returned interface{} to handle integers (int/int64/float64) and strings, converting numbers to their exact string representation only when needed so numeric thresholds are preserved; update any downstream usage that assumes string to accept the normalized string produced by your type-switch.
🧹 Nitpick comments (2)
pkg/proposal/proposal_test.go (1)
139-161: LGTM on coverage.Table-driven tests cover the z-stream / minor / unknown cases and unparseable inputs. One naming nit: the
"major"case (Line 148) actually exercises the minor-bump branch — consider renaming to something like"major-bump-classified-as-minor"to make the intent obvious, sinceclassifyUpdatedeliberately doesn’t distinguish major bumps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/proposal/proposal_test.go` around lines 139 - 161, Rename the misleading test case name inside TestClassifyUpdate so it reflects the behavior being tested: change the struct entry with name "major" (which calls classifyUpdate("4.15.1","5.0.0")) to a clearer label such as "major-bump-classified-as-minor" (keeping the expected value "minor"); this makes the intent explicit while leaving the classifyUpdate call and expected assertion unchanged.pkg/proposal/proposal.go (1)
275-283: Consider makingmaxAttemptsconfigurable.
maxAttempts: int64(2)is hard-coded. The CRD schema supports 0–20 andConfigalready plumbs other tunables (Namespace,Workflow,PromptConfigMap); exposingMaxAttempts(with the current2as default) avoids future code churn and keeps behavior consistent with how other knobs are surfaced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/proposal/proposal.go` around lines 275 - 283, Add a MaxAttempts configuration knob and use it instead of the hard-coded int64(2): add a MaxAttempts int (default 2) to the Config type that c.config references, validate/clip it to the CRD-supported range 0–20 when the config is constructed or loaded, and replace the literal int64(2) in the proposal spec construction (where "spec" -> "maxAttempts" is set) with int64(c.config.MaxAttempts) so the value is configurable and bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`:
- Around line 44-47: The CRD's openAPIV3Schema.description claims "Proposal is
cluster-scoped" but the CRD sets scope: Namespaced; fix by either changing
scope: Namespaced to scope: Cluster in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml or by
updating openAPIV3Schema.description (and any kubernetes.io/description/doc
comments) to accurately state it is namespaced; then ensure this choice matches
the runtime usage in pkg/proposal/proposal.go (the Create(...) call) and update
nearby comments/annotations to be consistent with the selected scope.
- Around line 6-10: The CRD manifest metadata for proposals.agentic.openshift.io
is missing the cluster-profile annotations used by CVO; update the
metadata.annotations block in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml to add
the same include.release.openshift.io/* profile keys used across other install
manifests (for example
include.release.openshift.io/self-managed-high-availability: "true" and any
other profiles your repo requires) so the CRD is rendered under the appropriate
cluster profiles when the feature gate (release.openshift.io/feature-gate:
LightspeedProposals) is enabled.
In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`:
- Around line 6-10: Add the required release profile annotations and a
description to the CRD metadata: inside the metadata.annotations block for the
resource named agents.agentic.openshift.io, add the
include.release.openshift.io/* profile annotations (e.g.,
include.release.openshift.io/approved: "true" and the appropriate profile keys
used across manifests such as include.release.openshift.io/optional,
include.release.openshift.io/cluster or similar per your repo conventions) and
add a kubernetes.io/description annotation that succinctly explains this is a
temporary Agent CRD used for LightspeedProposals and its intended lifecycle;
update the annotations list under metadata.annotations in the manifest where
name: agents.agentic.openshift.io appears.
- Around line 98-115: The CRD currently allows invalid Agent specs because
llmRef.name is not required and outputFields vs rawOutputSchema mutual exclusion
is not enforced; update the Agent CRD OpenAPI schema: mark llmRef.name as
required (add "required: [\"name\"]" under llmRef properties or add it to the
containing object) and add a validation that enforces mutual exclusivity between
spec.outputFields and spec.rawOutputSchema (use oneOf/anyOf with schema
predicates or an x-kubernetes-validation rule that checks exactly one is set),
applying the same changes to the other Agent-like schema blocks referenced
(lines ~213-447, 518-521) so invalid Agents are rejected at admission.
In `@install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml`:
- Around line 6-10: The manifest for the CRD named
workflows.agentic.openshift.io is missing required cluster-profile and
description annotations; add appropriate include.release.openshift.io/* profile
annotations (e.g., include.release.openshift.io/accept-never,
include.release.openshift.io/optional, or the correct profiles your repo uses)
under metadata.annotations and add a kubernetes.io/description annotation that
briefly explains this is a temporary Workflow CRD for LightspeedProposals and
its intended scope/purpose so the resource has both release-profile and
descriptive metadata.
- Around line 95-119: The CRD descriptions state agentRef must be omitted or nil
when skip is true but the validation only requires agentRef.name when skip is
false; add a CEL validation to enforce the contract (e.g., a rule that when skip
== true then agentRef is null/absent or agentRef.name is empty, and when skip ==
false agentRef.name must be present) for the step object containing the skip and
agentRef fields (referencing agentRef, agentRef.name and skip) or alternatively
relax the description to match existing CEL rules—apply the same change to the
other step schemas called out (lines 127-151, 159-183, 189-198).
In `@pkg/cvo/cvo.go`:
- Around line 1275-1286: The code uses optr.release.Version for readiness output
and proposal creation which can miss merged fallback metadata; call and use the
merged currentVersion() result instead: obtain v := optr.currentVersion() (or
however currentVersion() is exposed), pass v into readiness.RunAll(ctx, dc, v,
targetVersion) and into optr.proposalCreator.MaybeCreateProposal(ctx, v,
targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON), and
remove references to optr.release.Version so readinessJSON and proposal metadata
consistently use the merged version.
In `@pkg/readiness/network.go`:
- Around line 33-43: The readiness checker currently embeds raw proxy config
into result by reading GetResource(ctx, dc, GVRProxy, "cluster") and putting
NestedString(proxy.Object, "spec", "httpProxy"/"httpsProxy"/"noProxy") into
result; change this to avoid exposing credentials/domains by replacing the raw
strings with redacted summaries or booleans (e.g., "http_proxy": {"present":
true, "redacted": true, "length": N} or simply {"present": true}) and/or
sanitized metadata (e.g., host-only or masked value), update the code that sets
result["proxy"] (and any callers expecting these keys) accordingly, and keep
error handling via SectionError unchanged; use the proxy variable, GVRProxy,
GetResource and NestedString locations to find the block to modify.
In `@test/cvo/readiness.go`:
- Line 20: The g.Describe block for the cluster-version-operator readiness E2E
test is missing Ginkgo labels; update the g.Describe call (the Describe defined
as var _ = g.Describe(`[Jira:"Cluster Version Operator"]
cluster-version-operator readiness checks`, func() { ... })) to include g.Label
calls for CI categorization (for example add g.Label("Serial") and any relevant
ticket/feature label like g.Label("OTA-1813") or a TechPreview label)
immediately inside the Describe block so the suite is selectable/filterable by
CI tooling.
---
Duplicate comments:
In `@pkg/cvo/cvo.go`:
- Around line 775-779: The proposal path currently calls
optr.handleProposalAnnotation synchronously from the sync path (guarded by
optr.shouldCreateLightspeedProposals()) which can block reconciliation; change
this to a non-blocking bounded call by creating a short child context with a
timeout (e.g., context.WithTimeout) and invoking handleProposalAnnotation in a
separate goroutine (or enqueue the request to an existing worker queue) so the
main sync returns immediately; ensure you pass the timeout-bound ctx into
handleProposalAnnotation and log any timeout/error outcomes instead of letting
it delay status reconciliation.
In `@pkg/readiness/node_capacity.go`:
- Line 10: The comment for NodeCapacityCheck is inaccurate—update the
docstring/comment and any kubernetes.io/description annotations near the
NodeCapacityCheck function to reflect that it reports node readiness and
schedulability counts (not resource headroom); locate the NodeCapacityCheck
declaration in node_capacity.go and change the description text to explicitly
say it assesses node readiness and counts schedulable/unschedulable nodes so the
wording matches the actual output.
In `@pkg/readiness/pdb_drain.go`:
- Around line 26-27: The code currently calls NestedString(pdb.Object, "spec",
"maxUnavailable") and NestedString(... "minAvailable") which coerces numeric PDB
thresholds to strings and loses integer values; change the extraction to fetch
the raw value (use unstructured.NestedFieldCopy or NestedFieldNoCopy on
pdb.Object with "spec","maxUnavailable" and "spec","minAvailable"), then
type-switch the returned interface{} to handle integers (int/int64/float64) and
strings, converting numbers to their exact string representation only when
needed so numeric thresholds are preserved; update any downstream usage that
assumes string to accept the normalized string produced by your type-switch.
---
Nitpick comments:
In `@pkg/proposal/proposal_test.go`:
- Around line 139-161: Rename the misleading test case name inside
TestClassifyUpdate so it reflects the behavior being tested: change the struct
entry with name "major" (which calls classifyUpdate("4.15.1","5.0.0")) to a
clearer label such as "major-bump-classified-as-minor" (keeping the expected
value "minor"); this makes the intent explicit while leaving the classifyUpdate
call and expected assertion unchanged.
In `@pkg/proposal/proposal.go`:
- Around line 275-283: Add a MaxAttempts configuration knob and use it instead
of the hard-coded int64(2): add a MaxAttempts int (default 2) to the Config type
that c.config references, validate/clip it to the CRD-supported range 0–20 when
the config is constructed or loaded, and replace the literal int64(2) in the
proposal spec construction (where "spec" -> "maxAttempts" is set) with
int64(c.config.MaxAttempts) so the value is configurable and bounded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: e7f06f79-f57a-4dae-a3b6-f452cd6f89ce
⛔ Files ignored due to path filters (3)
go.sumis excluded by!**/*.sum,!go.sumvendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (30)
AGENTS.mdgo.modinstall/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yamlinstall/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yamlinstall/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yamlinstall/0000_00_cluster-version-operator_50_lightspeed-prompts.yamlinstall/0000_00_cluster-version-operator_51_lightspeed-agents.yamlinstall/0000_00_cluster-version-operator_52_lightspeed-workflows.yamlpkg/cvo/availableupdates.gopkg/cvo/cvo.gopkg/cvo/status_test.gopkg/featuregates/featuregates.gopkg/proposal/proposal.gopkg/proposal/proposal_test.gopkg/readiness/api_deprecations.gopkg/readiness/check.gopkg/readiness/check_test.gopkg/readiness/checks_test.gopkg/readiness/client.gopkg/readiness/client_test.gopkg/readiness/cluster_conditions.gopkg/readiness/crd_compat.gopkg/readiness/etcd_health.gopkg/readiness/network.gopkg/readiness/node_capacity.gopkg/readiness/olm_lifecycle.gopkg/readiness/olm_lifecycle_test.gopkg/readiness/operator_health.gopkg/readiness/pdb_drain.gotest/cvo/readiness.go
✅ Files skipped from review due to trivial changes (8)
- go.mod
- pkg/cvo/status_test.go
- install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
- install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
- pkg/readiness/operator_health.go
- pkg/readiness/client_test.go
- pkg/cvo/availableupdates.go
- pkg/readiness/checks_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- AGENTS.md
- pkg/readiness/check_test.go
- pkg/readiness/crd_compat.go
- pkg/readiness/api_deprecations.go
- pkg/readiness/olm_lifecycle_test.go
- pkg/readiness/cluster_conditions.go
| metadata: | ||
| annotations: | ||
| release.openshift.io/feature-gate: LightspeedProposals | ||
| controller-gen.kubebuilder.io/version: v0.19.0 | ||
| name: proposals.agentic.openshift.io |
There was a problem hiding this comment.
Missing cluster-profile include.release.openshift.io/* annotations.
Other manifests in install/ carry include.release.openshift.io/self-managed-high-availability: "true" (and friends). This CRD has only the feature-gate annotation, which means CVO may not render it on any profile. Please add the appropriate cluster-profile annotations so the stub CRD is actually installed when the feature gate is enabled.
As per coding guidelines, “All manifests must have appropriate cluster-profile annotations (include.release.openshift.io/self-managed-high-availability, etc.)”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`
around lines 6 - 10, The CRD manifest metadata for
proposals.agentic.openshift.io is missing the cluster-profile annotations used
by CVO; update the metadata.annotations block in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml to add
the same include.release.openshift.io/* profile keys used across other install
manifests (for example
include.release.openshift.io/self-managed-high-availability: "true" and any
other profiles your repo requires) so the CRD is rendered under the appropriate
cluster profiles when the feature gate (release.openshift.io/feature-gate:
LightspeedProposals) is enabled.
| operator itself (escalation child proposals).\n\nProposal is cluster-scoped. | ||
| The operator watches for new Proposals and\ndrives them through the lifecycle | ||
| automatically. Users interact with\nproposals in the Proposed phase to approve, | ||
| deny, or escalate.\n\nExample — a remediation proposal targeting a specific |
There was a problem hiding this comment.
Schema description contradicts scope: Namespaced.
Line 18 declares scope: Namespaced, but the openAPIV3Schema.description states “Proposal is cluster-scoped. The operator watches for new Proposals and drives them through the lifecycle automatically.” This will confuse consumers/agents reading the CRD. Either make the CRD cluster-scoped (to match the upstream intent and the stub comment at line 1–2) or update the description to reflect namespaced behavior — and double-check this aligns with the namespaced Create(...) call in pkg/proposal/proposal.go (Line 284).
As per coding guidelines, “check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior. Outdated documentation is worse than no documentation.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml`
around lines 44 - 47, The CRD's openAPIV3Schema.description claims "Proposal is
cluster-scoped" but the CRD sets scope: Namespaced; fix by either changing
scope: Namespaced to scope: Cluster in
install/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yaml or by
updating openAPIV3Schema.description (and any kubernetes.io/description/doc
comments) to accurately state it is namespaced; then ensure this choice matches
the runtime usage in pkg/proposal/proposal.go (the Create(...) call) and update
nearby comments/annotations to be consistent with the selected scope.
| metadata: | ||
| annotations: | ||
| release.openshift.io/feature-gate: LightspeedProposals | ||
| controller-gen.kubebuilder.io/version: v0.19.0 | ||
| name: agents.agentic.openshift.io |
There was a problem hiding this comment.
Add release profile and resource description annotations.
This new install manifest only has the feature-gate annotation. Add the appropriate include.release.openshift.io/* profile annotations and a kubernetes.io/description explaining the temporary Agent CRD.
As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`
around lines 6 - 10, Add the required release profile annotations and a
description to the CRD metadata: inside the metadata.annotations block for the
resource named agents.agentic.openshift.io, add the
include.release.openshift.io/* profile annotations (e.g.,
include.release.openshift.io/approved: "true" and the appropriate profile keys
used across manifests such as include.release.openshift.io/optional,
include.release.openshift.io/cluster or similar per your repo conventions) and
add a kubernetes.io/description annotation that succinctly explains this is a
temporary Agent CRD used for LightspeedProposals and its intended lifecycle;
update the annotations list under metadata.annotations in the manifest where
name: agents.agentic.openshift.io appears.
| llmRef: | ||
| description: |- | ||
| llmRef references a cluster-scoped LlmProvider CR that supplies the | ||
| LLM backend for this agent. The operator resolves this reference at | ||
| reconcile time and configures the sandbox pod with the provider's | ||
| credentials and model. | ||
| properties: | ||
| name: | ||
| default: "" | ||
| description: |- | ||
| Name of the referent. | ||
| This field is effectively required, but due to backwards compatibility is | ||
| allowed to be empty. Instances of this type with an empty value here are | ||
| almost certainly wrong. | ||
| More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names | ||
| type: string | ||
| type: object | ||
| x-kubernetes-map-type: atomic |
There was a problem hiding this comment.
Reject invalid Agent specs at admission.
llmRef.name is effectively required but can be empty, and outputFields/rawOutputSchema are documented as mutually exclusive without enforcement. Add spec-level validation so invalid Agents fail early.
Proposed validation
required:
- llmRef
- skills
type: object
+ x-kubernetes-validations:
+ - message: llmRef.name is required
+ rule: has(self.llmRef) && self.llmRef.name != ''
+ - message: outputFields and rawOutputSchema are mutually exclusive
+ rule: '!(has(self.outputFields) && has(self.rawOutputSchema))'Also applies to: 213-447, 518-521
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yaml`
around lines 98 - 115, The CRD currently allows invalid Agent specs because
llmRef.name is not required and outputFields vs rawOutputSchema mutual exclusion
is not enforced; update the Agent CRD OpenAPI schema: mark llmRef.name as
required (add "required: [\"name\"]" under llmRef properties or add it to the
containing object) and add a validation that enforces mutual exclusivity between
spec.outputFields and spec.rawOutputSchema (use oneOf/anyOf with schema
predicates or an x-kubernetes-validation rule that checks exactly one is set),
applying the same changes to the other Agent-like schema blocks referenced
(lines ~213-447, 518-521) so invalid Agents are rejected at admission.
| metadata: | ||
| annotations: | ||
| release.openshift.io/feature-gate: LightspeedProposals | ||
| controller-gen.kubebuilder.io/version: v0.19.0 | ||
| name: workflows.agentic.openshift.io |
There was a problem hiding this comment.
Add release profile and resource description annotations.
This new install manifest only has the feature-gate annotation. Add the appropriate include.release.openshift.io/* profile annotations and a kubernetes.io/description explaining the temporary Workflow CRD.
As per coding guidelines, “All manifests must have appropriate cluster-profile annotations” and “Use kubernetes.io/description annotations to explain the resource's purpose.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yaml`
around lines 6 - 10, The manifest for the CRD named
workflows.agentic.openshift.io is missing required cluster-profile and
description annotations; add appropriate include.release.openshift.io/* profile
annotations (e.g., include.release.openshift.io/accept-never,
include.release.openshift.io/optional, or the correct profiles your repo uses)
under metadata.annotations and add a kubernetes.io/description annotation that
briefly explains this is a temporary Workflow CRD for LightspeedProposals and
its intended scope/purpose so the resource has both release-profile and
descriptive metadata.
| dc := optr.proposalCreator.DynamicClient() | ||
| output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion) | ||
| var readinessJSON string | ||
| if data, err := json.Marshal(output); err != nil { | ||
| klog.Warningf("Failed to marshal readiness check output: %v", err) | ||
| } else { | ||
| readinessJSON = string(data) | ||
| klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s", | ||
| output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON) | ||
| } | ||
|
|
||
| optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON) |
There was a problem hiding this comment.
Use currentVersion() for proposal metadata.
optr.release.Version can miss fallback metadata that currentVersion() already merges from cached available updates. Use the merged value consistently for readiness output and proposal labels/request content.
Proposed fix
dc := optr.proposalCreator.DynamicClient()
- output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion)
+ currentVersion := optr.currentVersion().Version
+ output := readiness.RunAll(ctx, dc, currentVersion, targetVersion)
var readinessJSON string
if data, err := json.Marshal(output); err != nil {
klog.Warningf("Failed to marshal readiness check output: %v", err)
} else {
@@
- optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)
+ optr.proposalCreator.MaybeCreateProposal(ctx, currentVersion, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dc := optr.proposalCreator.DynamicClient() | |
| output := readiness.RunAll(ctx, dc, optr.release.Version, targetVersion) | |
| var readinessJSON string | |
| if data, err := json.Marshal(output); err != nil { | |
| klog.Warningf("Failed to marshal readiness check output: %v", err) | |
| } else { | |
| readinessJSON = string(data) | |
| klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s", | |
| output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON) | |
| } | |
| optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON) | |
| dc := optr.proposalCreator.DynamicClient() | |
| currentVersion := optr.currentVersion().Version | |
| output := readiness.RunAll(ctx, dc, currentVersion, targetVersion) | |
| var readinessJSON string | |
| if data, err := json.Marshal(output); err != nil { | |
| klog.Warningf("Failed to marshal readiness check output: %v", err) | |
| } else { | |
| readinessJSON = string(data) | |
| klog.V(2).Infof("Readiness check completed (%d/%d ok, %.1fs): %s", | |
| output.Meta.ChecksOK, output.Meta.TotalChecks, output.Meta.ElapsedSeconds, readinessJSON) | |
| } | |
| optr.proposalCreator.MaybeCreateProposal(ctx, currentVersion, targetVersion, targetKind, config.Spec.Channel, au.Updates, readinessJSON) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/cvo/cvo.go` around lines 1275 - 1286, The code uses optr.release.Version
for readiness output and proposal creation which can miss merged fallback
metadata; call and use the merged currentVersion() result instead: obtain v :=
optr.currentVersion() (or however currentVersion() is exposed), pass v into
readiness.RunAll(ctx, dc, v, targetVersion) and into
optr.proposalCreator.MaybeCreateProposal(ctx, v, targetVersion, targetKind,
config.Spec.Channel, au.Updates, readinessJSON), and remove references to
optr.release.Version so readinessJSON and proposal metadata consistently use the
merged version.
| // Check proxy | ||
| proxy, err := GetResource(ctx, dc, GVRProxy, "cluster") | ||
| if err != nil { | ||
| SectionError(§ionErrors, "proxy", err) | ||
| } else { | ||
| result["proxy"] = map[string]any{ | ||
| "http_proxy": NestedString(proxy.Object, "spec", "httpProxy"), | ||
| "https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"), | ||
| "no_proxy": NestedString(proxy.Object, "spec", "noProxy"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Do not embed raw proxy configuration in proposal data.
httpProxy / httpsProxy can include credentials, and noProxy can expose internal domains. Because readiness JSON is embedded into Proposal.spec.request, emit booleans or redacted summaries instead of raw values.
Proposed redaction
} else {
+ httpProxy := NestedString(proxy.Object, "spec", "httpProxy")
+ httpsProxy := NestedString(proxy.Object, "spec", "httpsProxy")
+ noProxy := NestedString(proxy.Object, "spec", "noProxy")
result["proxy"] = map[string]any{
- "http_proxy": NestedString(proxy.Object, "spec", "httpProxy"),
- "https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"),
- "no_proxy": NestedString(proxy.Object, "spec", "noProxy"),
+ "http_proxy_configured": httpProxy != "",
+ "https_proxy_configured": httpsProxy != "",
+ "no_proxy_configured": noProxy != "",
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check proxy | |
| proxy, err := GetResource(ctx, dc, GVRProxy, "cluster") | |
| if err != nil { | |
| SectionError(§ionErrors, "proxy", err) | |
| } else { | |
| result["proxy"] = map[string]any{ | |
| "http_proxy": NestedString(proxy.Object, "spec", "httpProxy"), | |
| "https_proxy": NestedString(proxy.Object, "spec", "httpsProxy"), | |
| "no_proxy": NestedString(proxy.Object, "spec", "noProxy"), | |
| } | |
| } | |
| // Check proxy | |
| proxy, err := GetResource(ctx, dc, GVRProxy, "cluster") | |
| if err != nil { | |
| SectionError(§ionErrors, "proxy", err) | |
| } else { | |
| httpProxy := NestedString(proxy.Object, "spec", "httpProxy") | |
| httpsProxy := NestedString(proxy.Object, "spec", "httpsProxy") | |
| noProxy := NestedString(proxy.Object, "spec", "noProxy") | |
| result["proxy"] = map[string]any{ | |
| "http_proxy_configured": httpProxy != "", | |
| "https_proxy_configured": httpsProxy != "", | |
| "no_proxy_configured": noProxy != "", | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/network.go` around lines 33 - 43, The readiness checker
currently embeds raw proxy config into result by reading GetResource(ctx, dc,
GVRProxy, "cluster") and putting NestedString(proxy.Object, "spec",
"httpProxy"/"httpsProxy"/"noProxy") into result; change this to avoid exposing
credentials/domains by replacing the raw strings with redacted summaries or
booleans (e.g., "http_proxy": {"present": true, "redacted": true, "length": N}
or simply {"present": true}) and/or sanitized metadata (e.g., host-only or
masked value), update the code that sets result["proxy"] (and any callers
expecting these keys) accordingly, and keep error handling via SectionError
unchanged; use the proxy variable, GVRProxy, GetResource and NestedString
locations to find the block to modify.
| "github.com/openshift/cluster-version-operator/test/util" | ||
| ) | ||
|
|
||
| var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect existing Ginkgo labels in test suites.
rg -n -C2 'g\.Describe\(|g\.Context\(|g\.It\(|g\.Label\(' test --type goRepository: openshift/cluster-version-operator
Length of output: 6506
🏁 Script executed:
# Check the full readiness.go file structure for all Ginkgo blocks and external endpoints
cat -n test/cvo/readiness.goRepository: openshift/cluster-version-operator
Length of output: 9647
Add Ginkgo labels for this E2E test suite.
The test follows the correct Jira-formatted name but lacks required Ginkgo labels for CI selection and filtering. Add the appropriate category label(s) to the g.Describe block to comply with E2E conventions. The pattern used by other tests in this directory (e.g., g.Label("Serial"), g.Label("OTA-1813")) shows how labels organize test categorization.
Example structure
-var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, func() {
+var _ = g.Describe(`[Jira:"Cluster Version Operator"] cluster-version-operator readiness checks`, g.Label("category-label"), func() {Replace category-label with the appropriate category for this test (e.g., Serial, a bug reference, or feature category).
As per coding guidelines, "Use Ginkgo Labels to mark test categories (e.g., TechPreview, serial)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/cvo/readiness.go` at line 20, The g.Describe block for the
cluster-version-operator readiness E2E test is missing Ginkgo labels; update the
g.Describe call (the Describe defined as var _ = g.Describe(`[Jira:"Cluster
Version Operator"] cluster-version-operator readiness checks`, func() { ... }))
to include g.Label calls for CI categorization (for example add
g.Label("Serial") and any relevant ticket/feature label like g.Label("OTA-1813")
or a TechPreview label) immediately inside the Describe block so the suite is
selectable/filterable by CI tooling.
Integrate CVO with the Lightspeed agent framework by creating Proposal CRs when available updates are discovered. This includes OLM operator lifecycle readiness checks, product lifecycle (PLCC) data via a shared agentic-skills image, and proposal creation gated behind the LightspeedProposals feature gate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f5cc163 to
07f4de9
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
pkg/readiness/client.go (2)
153-163:⚠️ Potential issue | 🟠 MajorReturn parse errors instead of treating them as equality.
CompareVersionsstill returns0when either version fails to parse, so callers cannot distinguish invalid input from equal versions. That can silently skew readiness decisions that depend on version ordering. This was already flagged and appears unresolved.🛠️ Proposed local fix
-// CompareVersions compares two semver strings. Returns -1, 0, or 1. -func CompareVersions(a, b string) int { +// CompareVersions compares two semver strings. Returns -1, 0, or 1. +func CompareVersions(a, b string) (int, error) { va, err := semver.ParseTolerant(a) if err != nil { - return 0 + return 0, err } vb, err := semver.ParseTolerant(b) if err != nil { - return 0 + return 0, err } - return va.Compare(vb) + return va.Compare(vb), nil }Verify and update all callers to handle the new error path:
#!/bin/bash # Description: Find CompareVersions callers and surrounding branch logic. # Expectation: every caller explicitly handles the returned error before using the comparison result. rg -n -C3 --type=go '\bCompareVersions\s*\('As per coding guidelines, “When modifying code, check that nearby comments, kubernetes.io/description annotations, and doc strings still accurately describe the new behavior.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/client.go` around lines 153 - 163, CompareVersions currently masks parse errors by returning 0; change its signature from CompareVersions(a, b string) int to CompareVersions(a, b string) (int, error) and return a non-nil error when semver.ParseTolerant(a) or semver.ParseTolerant(b) fails instead of returning 0; on success return va.Compare(vb) with nil error. Update the function comment to document the new (int, error) return and then find and update all callers of CompareVersions to handle the error path (e.g., via the suggested rg search) before using the comparison result, and ensure any kubernetes.io/description comments or docstrings near CompareVersions still accurately describe the new behavior.
3-5:⚠️ Potential issue | 🟡 MinorSort selector parts for deterministic output.
FormatLabelSelectorstill depends on Go map iteration order, so the same label map can produce different selector strings. This was already flagged and appears unresolved.♻️ Proposed fix
import ( "context" + "sort" "strings"func FormatLabelSelector(labels map[string]string) string { parts := make([]string, 0, len(labels)) for k, v := range labels { parts = append(parts, k+"="+v) } + sort.Strings(parts) return strings.Join(parts, ",") }As per coding guidelines, “When sorting or deduplicating collections, ensure stable ordering by including a tiebreaker field (e.g., sort by version then by name).”
Also applies to: 166-172
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/client.go` around lines 3 - 5, FormatLabelSelector currently builds the selector from a map and relies on Go's map iteration order; change it to collect each "key=value" (or "key in (v1,v2)" parts) into a slice, sort that slice deterministically using a stable comparator that first sorts by key and then by value (as a tiebreaker), and then join the sorted parts to form the selector string. Apply the same approach to the other selector-construction logic around lines 166-172 (the other function that assembles selector parts) so both functions produce stable, deterministic output.
🧹 Nitpick comments (4)
pkg/readiness/client_test.go (1)
109-126: Usestrings.Contains; the "empty" case asserts nothing.The inner triple loop re-implements
strings.Contains. Also, the"empty"sub-case hascontains: []string{}, so the inner loop body never executes — it's effectively a no-op (a dedicated assertion likegot == ""would make the intent real).Suggested cleanup
- for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - got := FormatLabelSelector(tt.labels) - for _, s := range tt.contains { - found := false - for i := 0; i <= len(got)-len(s); i++ { - if got[i:i+len(s)] == s { - found = true - break - } - } - if !found { - t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s) - } - } - }) - } + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := FormatLabelSelector(tt.labels) + if len(tt.contains) == 0 && got != "" { + t.Errorf("FormatLabelSelector(%v) = %q, want empty", tt.labels, got) + } + for _, s := range tt.contains { + if !strings.Contains(got, s) { + t.Errorf("FormatLabelSelector(%v) = %q, want to contain %q", tt.labels, got, s) + } + } + }) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/client_test.go` around lines 109 - 126, Replace the manual substring search in the test loop with strings.Contains by importing "strings" and using if !strings.Contains(got, s) to check each expected substring for FormatLabelSelector; additionally, make the "empty" test case assert the actual intended output (e.g., assert got == "" or whatever the expected string is) instead of leaving contains: []string{} which makes the inner loop a no-op—update the tests slice entry for the "empty" case to include the explicit expected value and assert it (refer to the FormatLabelSelector call and the tests variable in the test function).pkg/readiness/checks_test.go (1)
13-37: Consider sharing the GVR→list-kind map with production code.This
gvrsmap duplicates knowledge that should already exist alongsideGVRClusterVersion,GVRClusterOperator, etc. declarations (probablypkg/readiness/client.go). If a new GVR constant is added and used by a check but not registered here, tests for that check will surface a confusing "no kind" error rather than a missing-fixture assertion. A small exported helper (e.g.,readiness.ListKinds()) used by both the production code and the test fake would prevent that drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/checks_test.go` around lines 13 - 37, The test's gvrs map in newFakeDynamicClient duplicates the production mapping of GVR→list-kind; add an exported helper (e.g., ListKinds() returning map[schema.GroupVersionResource]string) in the readiness package (alongside the GVR constants in client.go), switch newFakeDynamicClient to call readiness.ListKinds() instead of inlining gvrs, and update production code to use the same helper so both runtime and tests share one source of truth and avoid drift when new GVRs are added.pkg/readiness/check.go (1)
83-151: LGTM, minor observability gap.The concurrent orchestration with per-check timeout, error capture, and aggregated counts is clean. One small suggestion: when a check errors out (including timeout), nothing is logged at the
RunAlllevel — callers embedding this JSON into a Proposal will see the_errorfield, but on-cluster debugging via CVO logs won't show which check failed or why. A singleklog.V(4).Infof(orWarningffor non-timeout errors) inside the goroutine on the error branch would help operational triage without affecting output.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/check.go` around lines 83 - 151, When a check fails inside RunAll (the goroutine that calls ch.Run with perCheckTimeout), add a klog emit in the error branch so failures are visible in cluster logs: detect timeout vs other errors (the err returned from ch.Run) and call klog.V(4).Infof for timeout-related errors and klog.Warningf for non-timeout errors, include the check name (ch.Name()), the error string, elapsed seconds, and any short snippet of result data; keep this logging inside the same goroutine before populating results[ch.Name()] so the existing CheckResult building (Status, Error, Data) remains unchanged.pkg/readiness/api_deprecations.go (1)
20-31: Brittle string-match for "resource not available" detection.Using
strings.Contains(err.Error(), "not found")to classify a non-fatal absence is fragile — error message wording can change across client-go versions, and "not found" also legitimately appears in unrelated errors (e.g., a missing namespace). Prefer typed checks againstmeta.IsNoMatchError(err)andapierrors.IsNotFound(err), which is exactly the patternpkg/proposal/proposal.go(isNoMatchError) already uses in this PR.🛡️ Proposed fix
import ( "context" "fmt" - "strings" + apierrors "k8s.io/apimachinery/pkg/api/errors" + "k8s.io/apimachinery/pkg/api/meta" "k8s.io/client-go/dynamic" ) @@ - if strings.Contains(err.Error(), "not found") { + if meta.IsNoMatchError(err) || apierrors.IsNotFound(err) { result["warning"] = "APIRequestCount resource not available"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/readiness/api_deprecations.go` around lines 20 - 31, Replace the brittle string match on err.Error() in the APIRequestCount listing with proper typed checks: when ListResources(ctx, dc, GVRAPIRequestCount, "") returns an error, test it using meta.IsNoMatchError(err) and k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or the repo helper isNoMatchError if available) instead of strings.Contains; if either indicates the resource/GVR is absent, set the same warning/blocker fields and return result, otherwise propagate the error (the affected symbols are ListResources, GVRAPIRequestCount and the error handling branch around that call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/cvo/cvo.go`:
- Around line 1247-1252: When creating proposals for conditional updates, pass
combined context containing both recommended and conditional versions instead of
only au.Updates: inside the loop that calls
optr.proposalCreator.MaybeCreateProposal for each entry of
au.ConditionalUpdates, build a deduplicated slice that merges au.Updates and the
versions from au.ConditionalUpdates (or alternatively extend the buildRequest
signature to accept conditional updates) and pass that merged slice as the
"other updates" argument so buildRequest can list both recommended and
conditional alternatives when generating the proposal.
In `@pkg/proposal/proposal.go`:
- Around line 171-185: In Creator.readSystemPrompt, currently when the ConfigMap
exists but the "prompt" key is missing or not a string the code silently returns
"", so change the function to log that misconfiguration before returning: detect
the missing-or-wrong-type branch after unstructured.NestedMap(obj.Object,
"data") and emit a klog.V(2).Infof or klog.Warningf including
c.config.Namespace, c.config.PromptConfigMap and a clear message that the
"prompt" key is missing or not a string (reference symbols: readSystemPrompt,
Creator, unstructured.NestedMap); keep behavior of returning "" but ensure the
log is written to make the issue discoverable.
In `@pkg/readiness/cluster_conditions.go`:
- Around line 34-42: The code treats a missing Upgradeable condition
(conditions[ConditionUpgradeable]) as a zero-value and reports non-upgradeable;
change the logic that builds result["upgradeable"] (and any use of
summary.upgradeable) to first check whether the ConditionUpgradeable key exists
in the conditions map, and if absent default to upgradeable=true (or status
"True"/"Unknown" per your conventions), only setting upgradeable=false when the
condition is present and its Status == ConditionFalse; similarly, for
ConditionProgressing handle missing keys explicitly (keep default false if you
prefer) instead of relying on zero-value semantics so
result["update_in_progress"] accurately reflects an explicit Progressing=False
vs missing condition.
---
Duplicate comments:
In `@pkg/readiness/client.go`:
- Around line 153-163: CompareVersions currently masks parse errors by returning
0; change its signature from CompareVersions(a, b string) int to
CompareVersions(a, b string) (int, error) and return a non-nil error when
semver.ParseTolerant(a) or semver.ParseTolerant(b) fails instead of returning 0;
on success return va.Compare(vb) with nil error. Update the function comment to
document the new (int, error) return and then find and update all callers of
CompareVersions to handle the error path (e.g., via the suggested rg search)
before using the comparison result, and ensure any kubernetes.io/description
comments or docstrings near CompareVersions still accurately describe the new
behavior.
- Around line 3-5: FormatLabelSelector currently builds the selector from a map
and relies on Go's map iteration order; change it to collect each "key=value"
(or "key in (v1,v2)" parts) into a slice, sort that slice deterministically
using a stable comparator that first sorts by key and then by value (as a
tiebreaker), and then join the sorted parts to form the selector string. Apply
the same approach to the other selector-construction logic around lines 166-172
(the other function that assembles selector parts) so both functions produce
stable, deterministic output.
---
Nitpick comments:
In `@pkg/readiness/api_deprecations.go`:
- Around line 20-31: Replace the brittle string match on err.Error() in the
APIRequestCount listing with proper typed checks: when ListResources(ctx, dc,
GVRAPIRequestCount, "") returns an error, test it using meta.IsNoMatchError(err)
and k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) (or the repo helper
isNoMatchError if available) instead of strings.Contains; if either indicates
the resource/GVR is absent, set the same warning/blocker fields and return
result, otherwise propagate the error (the affected symbols are ListResources,
GVRAPIRequestCount and the error handling branch around that call).
In `@pkg/readiness/check.go`:
- Around line 83-151: When a check fails inside RunAll (the goroutine that calls
ch.Run with perCheckTimeout), add a klog emit in the error branch so failures
are visible in cluster logs: detect timeout vs other errors (the err returned
from ch.Run) and call klog.V(4).Infof for timeout-related errors and
klog.Warningf for non-timeout errors, include the check name (ch.Name()), the
error string, elapsed seconds, and any short snippet of result data; keep this
logging inside the same goroutine before populating results[ch.Name()] so the
existing CheckResult building (Status, Error, Data) remains unchanged.
In `@pkg/readiness/checks_test.go`:
- Around line 13-37: The test's gvrs map in newFakeDynamicClient duplicates the
production mapping of GVR→list-kind; add an exported helper (e.g., ListKinds()
returning map[schema.GroupVersionResource]string) in the readiness package
(alongside the GVR constants in client.go), switch newFakeDynamicClient to call
readiness.ListKinds() instead of inlining gvrs, and update production code to
use the same helper so both runtime and tests share one source of truth and
avoid drift when new GVRs are added.
In `@pkg/readiness/client_test.go`:
- Around line 109-126: Replace the manual substring search in the test loop with
strings.Contains by importing "strings" and using if !strings.Contains(got, s)
to check each expected substring for FormatLabelSelector; additionally, make the
"empty" test case assert the actual intended output (e.g., assert got == "" or
whatever the expected string is) instead of leaving contains: []string{} which
makes the inner loop a no-op—update the tests slice entry for the "empty" case
to include the explicit expected value and assert it (refer to the
FormatLabelSelector call and the tests variable in the test function).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 9e893ac2-6d45-4560-b5a8-b5d8d84bdbb4
⛔ Files ignored due to path filters (3)
go.sumis excluded by!**/*.sum,!go.sumvendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (30)
AGENTS.mdgo.modinstall/0000_00_cluster-version-operator_45_lightspeed-crd-proposals.yamlinstall/0000_00_cluster-version-operator_46_lightspeed-crd-agents.yamlinstall/0000_00_cluster-version-operator_47_lightspeed-crd-workflows.yamlinstall/0000_00_cluster-version-operator_50_lightspeed-prompts.yamlinstall/0000_00_cluster-version-operator_51_lightspeed-agents.yamlinstall/0000_00_cluster-version-operator_52_lightspeed-workflows.yamlpkg/cvo/availableupdates.gopkg/cvo/cvo.gopkg/cvo/status_test.gopkg/featuregates/featuregates.gopkg/proposal/proposal.gopkg/proposal/proposal_test.gopkg/readiness/api_deprecations.gopkg/readiness/check.gopkg/readiness/check_test.gopkg/readiness/checks_test.gopkg/readiness/client.gopkg/readiness/client_test.gopkg/readiness/cluster_conditions.gopkg/readiness/crd_compat.gopkg/readiness/etcd_health.gopkg/readiness/network.gopkg/readiness/node_capacity.gopkg/readiness/olm_lifecycle.gopkg/readiness/olm_lifecycle_test.gopkg/readiness/operator_health.gopkg/readiness/pdb_drain.gotest/cvo/readiness.go
✅ Files skipped from review due to trivial changes (8)
- go.mod
- pkg/cvo/availableupdates.go
- install/0000_00_cluster-version-operator_52_lightspeed-workflows.yaml
- pkg/cvo/status_test.go
- install/0000_00_cluster-version-operator_50_lightspeed-prompts.yaml
- pkg/readiness/network.go
- pkg/readiness/olm_lifecycle_test.go
- pkg/proposal/proposal_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- AGENTS.md
- test/cvo/readiness.go
- pkg/readiness/pdb_drain.go
- pkg/readiness/crd_compat.go
- pkg/readiness/olm_lifecycle.go
| for _, u := range au.Updates { | ||
| optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, u.Version, "recommended", config.Spec.Channel, au.Updates, readinessJSON) | ||
| } | ||
| for _, u := range au.ConditionalUpdates { | ||
| optr.proposalCreator.MaybeCreateProposal(ctx, optr.release.Version, u.Release.Version, "conditional", config.Spec.Channel, au.Updates, readinessJSON) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect the MaybeCreateProposal signature and how the updates argument is used.
ast-grep --pattern $'func ($_ *Creator) MaybeCreateProposal($$$) {
$$$
}'
rg -nP --type=go -C5 'MaybeCreateProposal\s*\(' pkg/proposal
rg -nP --type=go -C3 '\bbuildRequest\b|\bclassifyUpdate\b' pkg/proposalRepository: openshift/cluster-version-operator
Length of output: 9919
🏁 Script executed:
cat -n pkg/proposal/proposal.go | sed -n '187,250p'Repository: openshift/cluster-version-operator
Length of output: 2565
🏁 Script executed:
# Check the context around lines 1247-1252 and how au is populated
rg -n 'au\.Updates|au\.ConditionalUpdates' pkg/cvo/cvo.go -B 20 -A 5 | head -120Repository: openshift/cluster-version-operator
Length of output: 2424
🏁 Script executed:
# Find the type definition for au (returned by getAvailableUpdates)
rg -n 'func.*getAvailableUpdates|type.*Updates' pkg/cvo/cvo.go | head -30
# Also look for where Updates and ConditionalUpdates are used together
rg -n 'ConditionalUpdates' pkg/cvo/ -B 3 -A 3Repository: openshift/cluster-version-operator
Length of output: 32273
🏁 Script executed:
rg -n 'func SelectTarget' pkg/ -A 15Repository: openshift/cluster-version-operator
Length of output: 1185
🏁 Script executed:
rg -n 'maybeCreateLightspeedProposals|Test.*Lightspeed' pkg/cvo/ -A 20 | head -80Repository: openshift/cluster-version-operator
Length of output: 2858
🏁 Script executed:
# Search for tests that specifically test proposal creation with conditional updates
rg -n 'TestMaybeCreateLightspeed|TestMaybeCreate.*Proposal|ConditionalUpdates.*Proposal|Proposal.*Conditional' pkg/cvo/ -B 2 -A 10Repository: openshift/cluster-version-operator
Length of output: 60
🏁 Script executed:
# Check the proposal package tests more broadly
rg -n 'func Test' pkg/proposal/proposal_test.go | head -20
# Also check for any comments explaining the design
rg -n 'conditional.*recommended|recommended.*conditional' pkg/proposal/ -B 2 -A 2Repository: openshift/cluster-version-operator
Length of output: 2400
🏁 Script executed:
# Check if there are any comments or design decisions about how updates are passed to proposals
rg -n 'Other.*recommended|context|graph' pkg/proposal/proposal.go -B 2 -A 2
# Also verify the structure of ConditionalUpdate to understand the difference from Release
rg -n 'type ConditionalUpdate|type Release' --type=go | grep -E 'configv1|proposal'Repository: openshift/cluster-version-operator
Length of output: 1093
🏁 Script executed:
# Let me check if there's any discussion in comments or in how evaluateConditionalUpdates works
# to understand if conditional updates should be part of the context
rg -n 'evaluateConditionalUpdates' pkg/cvo/availableupdates.go -B 5 -A 20 | head -50Repository: openshift/cluster-version-operator
Length of output: 2365
🏁 Script executed:
# Check if there are any other calls to MaybeCreateProposal or related patterns in the codebase
rg -n 'MaybeCreateProposal|buildRequest' pkg/ --type=go | grep -v test | grep -v '//'Repository: openshift/cluster-version-operator
Length of output: 832
Conditional update proposals should include context about other available updates, both recommended and conditional.
Line 1251 passes au.Updates (recommended only) when creating proposals for conditional targets. The buildRequest function (lines 210–231) uses this parameter to list "Other recommended versions available" in the proposal context. For conditional targets, this means the AI evaluates the conditional update in isolation from other conditional alternatives—it only sees recommended versions as context.
Consider passing a combined/deduplicated slice containing both au.Updates and the versions from au.ConditionalUpdates for the conditional loop, or extend buildRequest to accept conditional updates separately and include them in the proposal request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/cvo/cvo.go` around lines 1247 - 1252, When creating proposals for
conditional updates, pass combined context containing both recommended and
conditional versions instead of only au.Updates: inside the loop that calls
optr.proposalCreator.MaybeCreateProposal for each entry of
au.ConditionalUpdates, build a deduplicated slice that merges au.Updates and the
versions from au.ConditionalUpdates (or alternatively extend the buildRequest
signature to accept conditional updates) and pass that merged slice as the
"other updates" argument so buildRequest can list both recommended and
conditional alternatives when generating the proposal.
| func (c *Creator) readSystemPrompt(ctx context.Context) string { | ||
| if c.config.PromptConfigMap == "" { | ||
| return "" | ||
| } | ||
| obj, err := c.client.Resource(configMapGVR).Namespace(c.config.Namespace).Get(ctx, c.config.PromptConfigMap, metav1.GetOptions{}) | ||
| if err != nil { | ||
| klog.V(4).Infof("Could not read system prompt ConfigMap %s/%s: %v", c.config.Namespace, c.config.PromptConfigMap, err) | ||
| return "" | ||
| } | ||
| data, _, _ := unstructured.NestedMap(obj.Object, "data") | ||
| if prompt, ok := data["prompt"].(string); ok { | ||
| return prompt | ||
| } | ||
| return "" | ||
| } |
There was a problem hiding this comment.
Silent empty prompt obscures misconfiguration.
When the ConfigMap exists but is missing the prompt key (or the key isn't a string), readSystemPrompt silently returns "" and the final proposal is created without any system-prompt context — a config bug that's invisible until an operator reads the produced Proposal.spec.request by hand. Log at klog.V(2) or klog.Warningf for this branch to make the misconfiguration discoverable.
🛡️ Proposed fix
- data, _, _ := unstructured.NestedMap(obj.Object, "data")
- if prompt, ok := data["prompt"].(string); ok {
- return prompt
- }
- return ""
+ data, _, _ := unstructured.NestedMap(obj.Object, "data")
+ if prompt, ok := data["prompt"].(string); ok && prompt != "" {
+ return prompt
+ }
+ klog.Warningf("System prompt ConfigMap %s/%s missing non-empty 'prompt' key; proposals will be created without a system prompt",
+ c.config.Namespace, c.config.PromptConfigMap)
+ return ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/proposal/proposal.go` around lines 171 - 185, In
Creator.readSystemPrompt, currently when the ConfigMap exists but the "prompt"
key is missing or not a string the code silently returns "", so change the
function to log that misconfiguration before returning: detect the
missing-or-wrong-type branch after unstructured.NestedMap(obj.Object, "data")
and emit a klog.V(2).Infof or klog.Warningf including c.config.Namespace,
c.config.PromptConfigMap and a clear message that the "prompt" key is missing or
not a string (reference symbols: readSystemPrompt, Creator,
unstructured.NestedMap); keep behavior of returning "" but ensure the log is
written to make the issue discoverable.
| upgradeable := conditions[ConditionUpgradeable] | ||
| result["upgradeable"] = map[string]any{ | ||
| "status": upgradeable.Status, | ||
| "reason": upgradeable.Reason, | ||
| "message": upgradeable.Message, | ||
| } | ||
|
|
||
| progressing := conditions[ConditionProgressing] | ||
| result["update_in_progress"] = progressing.Status == ConditionTrue |
There was a problem hiding this comment.
Absent Upgradeable condition is reported as not upgradeable.
conditions[ConditionUpgradeable] yields a zero-value struct when the condition is missing, so result["upgradeable"].status becomes "" and summary.upgradeable becomes false. Per OpenShift convention, the absence of Upgradeable means the cluster is upgradeable (only Upgradeable=False blocks). With the current code, a healthy cluster that simply hasn't set the condition will be reported to the agent as non-upgradeable, which flips the downstream decision.
Consider treating missing/absent as "upgradeable = true" (or at least "unknown") and only setting false when the condition is explicitly False. Same consideration applies if you care about distinguishing "no Progressing condition" from "Progressing=False", though defaulting that one to false is reasonable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/readiness/cluster_conditions.go` around lines 34 - 42, The code treats a
missing Upgradeable condition (conditions[ConditionUpgradeable]) as a zero-value
and reports non-upgradeable; change the logic that builds result["upgradeable"]
(and any use of summary.upgradeable) to first check whether the
ConditionUpgradeable key exists in the conditions map, and if absent default to
upgradeable=true (or status "True"/"Unknown" per your conventions), only setting
upgradeable=false when the condition is present and its Status ==
ConditionFalse; similarly, for ConditionProgressing handle missing keys
explicitly (keep default false if you prefer) instead of relying on zero-value
semantics so result["update_in_progress"] accurately reflects an explicit
Progressing=False vs missing condition.
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
LightspeedProposalsfeature gate integration — when enabled, CVO createsProposalCRs (agentic.openshift.io/v1alpha1) after discovering available updatespkg/readiness/package runs 9 parallel cluster readiness checks (operator health, etcd, nodes, PDBs, API deprecations, CRD compat, network, OLM lifecycle, cluster conditions) and embeds the results in the proposalpkg/proposal/package handles proposal creation with deterministic naming andAlreadyExists-based deduprelease.openshift.io/feature-gate: LightspeedProposals)CustomNoUpgradefeature set — no impact on default clustersTest plan
pkg/proposal/(SelectTarget, dedup, buildRequest, classifyUpdate, sanitize, readSystemPrompt)pkg/readiness/(all 9 checks, RunAll orchestration, JSON marshaling, client helpers)pkg/featuregates/(gate parsing, feature change stopper)test/cvo/readiness.go(validates checks against real cluster with ground-truth comparisons)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests